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Appendix 

This patent application includes an appendix (the "Appendix"), which 
contains the source code for the software used in carrying out the examples in 
accordance with the present invention. 

A portion of the present disclosure contains material that is subject to 
copyright protection. The copyright owner has no objection to the facsimile 
reproduction by anyone of the Mtent document or the patent disclosure as it 
appears in the U.S. Patent and trademark Office patent files or records, but 
otherwise reserves all copyright rights whatsoever. 

BACKGROUND OF THE INVENTION 
1 . Field of the Invention. 



Significant morbidity and mortality are associated with infectious diseases 
and genetically inherited disorders. More rapid and accurate diagnostic methods 
are required for better monitoring and treatment of these conditions. Molecular 
methods using DNA probes, nucleic acid hybridization and in vitro amplification 
techniques are promising methods offering advantages to conventional methods 
used for patient diagnoses. 

Nucleic acid hybridization has been employed for investigating the identity 
and establishing the presence of nucleic acids. Hybridization is based on 
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complementary base pairing. When complementary single stranded nucleic acids 
are incubated together, the complementary base sequences pair to form double- 
stranded hybrid molecules. The ability of single stranded deoxyribonucleic acid 
(ssDNA) or ribonucleic acid (RNA) to form a hydrogen bonded structure with a 
5 complementary nucleic acid sequence has been employed as an analytical tool in 
molecular biology research. The availability of radioactive nucleoside 
triphosphates of high specific activity and the development of methods for their 
incorporation into DNA and RNA has made it possible to identify, isolate, and 
characterize various nucleic acid sequences of biological interest. Nucleic acid 

10 hybridization has great potential in diagnosing disease states associated with 

unique nucleic acid sequences. These unique nucleic acid sequences may result 
from genetic or environmental change in DNA by insertions, deletions, point 
mutations, or by acquiring foreign DNA or RNA by means of infection by bacteria, 
molds, fungi, and viruses. The application of nucleic acid hybridization as a 

15 diagnostic tool in clinical medicine is limited due to the cost and effort associated 
with the development of sufficiently sensitive and specific methods for detecting 
potentially low concentrations of disease-related DNA or RNA present in the 
complex mixture of nucleic acid sequences found in patient samples. 

One method for detecting specific nucleic acid sequences generally 

20 involves immobilization of the target nucleic acid on a solid support such as 
nitrocellulose paper, cellulose paper, diazotized paper, or a nylon membrane. 
After the target nucleic acid is fixed on the support, the support is contacted with a 
suitably labeled probe nucleic acid for about two to forty-eight hours. After the 
above time period, the solid support is washed several times at a controlled 

25 temperature to remove unhybridized probe. The support is then dried and the 
hybridized material is detected by autoradiography or by spectrometric methods. 
When very low concentrations must be detected, the above method is slow and 
labor intensive, and nonisotopic labels that are less readily detected than 
radiolabels are frequently not suitable. 

30 A method for the enzymatic amplification of specific segments of DNA 

known as the polymerase chain reaction (PGR) method has been described. This 
in vitro amplification procedure is based on repeated cycles of denaturation, 
oligonucleotide primer annealing, and primer extension by thermophilic 
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polymerase, resulting in the exponential increase in copies of the region flanked 
by the primers. The PGR primers, which anneal to opposite strands of the DNA, 
are positioned so that the polymerase catalyzed extension product of one primer 
can serve as a template strand for the other, leading to the accumulation of a 
5 discrete fragment whose length is defined by the distance between the 5' ends of 
the oligonucleotide primers. 

Other methods for amplifying nucleic acids have also been developed. 
These methods include single primer amplification, ligase chain reaction (LCRJ, 
transcription-mediated amplification methods including 3SR and NASBA, and the 
10 Q-beta-replicase method. Regardless of the amplification used, the amplified 
product must be detected, 
y One method for detecting nucleic acids is to employ nucleic acid probes 

%| that have sequences complementary to sequences in the target nucleic acid. A 

2 nucleic acid probe may be, or may be capable of being, labeled with a reporter 

fl 15 group or may be, or may be capable of becoming, bound to a support. Detection 
^ of signal depends upon the nature of the label or reporter group. Usually, the 

~ probe is comprised of natural nucleotides such as ribonucleotides and 

nJ deoxyribonucleotides and their derivatives although unnatural nucleotide mimetics 

jj? such as peptide nucleic acids and oligomeric nucleoside phosphonates are also 

y 20 used. Commonly, binding of the probes to the target is detected by means of a 
label incorporated into the probe. Alternatively, the probe may be unlabeled and 
the target nucleic acid labeled. Binding can be detected by separating the bound 
probe or target from the free probe or target and detecting the label. In one 
approach, a sandwich is formed comprised of one probe, which may be labeled, 
25 the target and a probe that is or can become bound to a surface. Alternatively, 
binding can be detected by a change in the signal-producing properties of the 
label upon binding, such as a change in the emission efficiency of a fluorescent or 
chemiluminescent label. This permits detection to be carried out without a 
separation step. Finally, binding can be detected by labeling the target, allowing 
30 the target to hybridize to a surface-bound probe, washing away the unbound 
target and detecting the labeled target that remains. 

Direct detection of labeled target hybridized to surface-bound probes is 
particularly advantageous if the surface contains a mosaic of different probes that 
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are individually localized to discrete, known areas of the surface. Such ordered 
arrays containing a large number of oligonucleotide probes have been developed 
as tools for high throughput analyses of genotype and gene expression. 
Oligonucleotides synthesized on a solid support recognize uniquely 
5 complementary nucleic acids by hybridization, and arrays can be designed to 
define specific target sequences, analyze gene expression patterns or identify 
specific allelic variations. One difficulty in the design of oligonucleotide arrays is 
that oligonucleotides targeted to different regions of the same gene can show 
large differences in hybridization efficiency, presumably due, at least in part, to 
10 the interplay between the secondary structures of the oligonucleotides and their 
targets and the stability of the final probe/target hybridization product. A method 
5 for predicting which oligonucleotides will show detectable hybridization would 

^ substantially decrease the number of iterations required for optimal array design 

4^ and would be particularly useful when the total number of oligonucleotide probes 

15 on the array is limited. A method to predict oligonucleotide hybridization efficiency 
would also streamline the empirical approaches currently used to select potential 
D antisense therapeutics, which are designed to modulate gene expression in vivo 

12 by hybridizing to specific messenger RNA (mRNA) molecules and inhibiting their 

^ translation into proteins. 

il 20 While it is well known that the structure of the target nucleic acid affects the 

affinity of oligonucleotide hybridization, current methods for predicting target 
structures from the primary sequence fail to predict target regions accessible for 
oligonucleotide binding. Consequently, selection of oligonucleotides for antisense 
reagents or oligonucleotide probe arrays has been largely empirical. As most of 

25 the target sequence is sequestered by intramolecular base pairing and not 
accessible for oligonucleotide binding, the process of identifying good 
oligonucleotides has required large numbers of low efficiency experiments. 

The design and implementation of algorithms that effectively predict the 
ability of oligonucleotides to rapidly and avidly bind to complementary nucleotide 

30 sequences has been an important problem in molecular biology since the 
invention of facile methods for chemical DNA synthesis. The subsequent 
inventions of the polymerase chain reaction (PGR), antisense inhibition of gene 
expression and oligonucleotide array methods for performing massively parallel 
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hybridization experiments have made the need for effective predictive algorithms 
even more critical. 

Previous attempts to solve the nucleic acid probe design problem include 
PGR primer design software applications (e.g., OLIGO®), neural networks, PGR 
5 primer design applications that search for sequences that possess minimal ability 
to cross-hybridize with other targets present in a sample (e.g.. HYBsimulator™), 
and approaches that attempt to predict the efficiency of antisense sequence 
suppression of mRNA translation from a combination of predicted nucleic acid 
duplex melting temperature and predicted target strand structure. The methods 
10 that predict effective oligonucleotide primers for performing PGR from DNA 

templates work well for that application where relatively stringent conditions are 
O employed. This is because PGR experimental design greatly simplifies the 

Cj prediction problem: hybridization is performed at high temperature, at relatively 

ro low ionic strength and in the presence of a large molar excess of oligonucleotide, 

fli 15 Under these conditions, the oligonucleotide and target secondary structures are 
^ relatively unimportant. 

^ Unfortunately, these conditions do not apply to oligonucleotide arrays, 

Sj which are usually hybridized under relatively non-denaturing conditions, or to 

'ft antisense suppression of gene expression, which takes place in vivo. 

O 20 Oligonucleotide arrays can contain hundreds of thousands of different sequences 
and conditions are chosen to allow the oligonucleotide with the lowest melting 
temperature to hybridize efficiently. These "lowest common denominator" 
conditions are usually relatively non-denaturing and secondary structure 
constraints become significant. Accordingly, the above applications require new 
25 predictive methods that are capable of estimating the effects of oligonucleotide 
and target structure on hybridization efficiency. For these reasons, current 
algorithms for designing PGR primer oligonucleotides fail badly when applied to 
the problems of oligonucleotide array or antisense oligonucleotide design. 

To date, the most effective approach for identifying oligonucleotides with 
30 good hybridization efficiency has been an empirical one. Such an approach 
involves the synthesis of large numbers of oligonucleotide probes for a given 
target nucleotide sequence. Arrays are formed that include the above 
oligonucleotide probes. Hybridization experiments are carried out to determine 
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which of the oligonucleotide probes exhibit good hybridization efficiencies. 
Examples of such an approach are found in D. Lockhart, et al.. Nature Biotech ., 
infra, L. Wodicka, et al., Nature Biotechnology , infra., and N. Milner et al. Nature 
Biotech , infra. One major drawback to this approach is the vast number of 
5 oligonucleotides that must be synthesized in order to achieve a satisfactory result. 
Typically, about 2%-5% of the test probes synthesized yield acceptable signal 
levels. 

The use of neural networks for oligonucleotide design has also been 
investigated. Neural networks are easily taught with real data; they therefore 

10 afford a general approach to many problems. However, their performance is 
limited by the "senses" that they are given. An analogy works best here: the 
human brain is an astoundingly capable neural network, but a blind person cannot 
be taught to reliably distinguish colors by smell. In addition, a large amount of 
data is required to adequately teach a neural network to perform its job well. A 

15 comprehensive database for either oligonucleotide array design or antisense 

suppression of gene expression has not been made available. For these reasons, 
the performance reported to-date of neural network solutions against the probe 
design problem is mediocre. 

Finally, approaches that have attempted to use target nucleic acid folding 

20 calculations to predict experimental results inferred to depend upon hybridization 
efficiency (e.g. antisense suppression of mRNA translation) have so far only 
demonstrated that the predictions of current nucleic acid folding calculations 
correlate poorly with observed behavior. The probable reason for this is that the 
structures predicted by such programs for long sequences are poor predictors of 

25 chemical reality; the results of experiments that attempt to confirm the predictions 
of such calculations support this assessment. Recent improvements to this 
approach which use predicted RNA structure topology as a predictor of relative 
RNA/RNA association kinetics have been more successful at forecasting the 
results of antisense experiments. However, these methods are not 

30 computationally efficient, and have so far only been shown to work for targets less 
than 100 bases long. Such methods are therefore not yet capable of predicting 
the behavior of full-length mRNA targets, which are typically between 1 ,000 and 
2,000 bases in length. 
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2. Description of the Related Art. 

U.S. Patent No. 5,512,438 (Ecker) discloses the inhibition of RNA 
expression by forming a pseudo-half knot RNA at the target's RNA secondary 
5 structure using antisense oligonucleotides. 

Cook, ef a/., in U.S. Patent No. 5,670,633 discuss sugar-nnodified 
oligonucleotides that detect and modulate gene expression. 

Antisense oligonucleotide inhibition of the RAS gene is disclosed in U.S. 
Patent No. 5,582,986 (Monia. eta\), 
10 U.S. Patent No. 5,593,834 (Lane, etal) discusses a method of preparing 

DNA sequences with known ligand binding characteristics. 

Mitsuhashi, ef a/., in U.S. Patent No. 5,556,749 discusses a computerized 
method for designing optimal DNA probes and an oligonucleotide probe design 
m station. 

S 15 U.S. Patent No. 5,081,584 (Omichinski, ef a/.) discloses a computer- 



assisted design of anti-peptides based on the amino acid sequence of a target 
peptide. 

A PGR primer design application that searches for sequences that possess 
minimal ability to cross-hybridize with other targets present in a sample is 



Q 20 available as HYBsimulator™, version 2.0, AGCT, Inc., 2102 Business Center 



Drive, Suite 170, Irvine, CA 92715 (714) 833-9983. 

A PGR primer design software application is available as OLIGO®, version 
5.0, National Biosciences, Inc., 3650 Annapolis Lane North, #140, Plymouth, MN 
55447 (800)747-4362. 



neural network approach to the selection of efficient surface-bound 
oligonucleotide probes, 

M. Mitsuhashi, ef a/., Nature , 367:759-761 (1994) disclose a method for 
designing specific oligonucleotide probes and primers by modeling the potential 
30 cross-hybridization of candidate probes to non-target sequences known to be 
present in samples. 

R. A. Stull, ef a/., Nuc. Acids Res ., 20:3501-3508 (1992) describe a method 
of predicting the efficacy of antisense oligonucleotides, using predicted target 



25 



D. J. Lockhart, ef a/.. Nature Biotech . 14:1675-1684 (1996) describe a 
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secondary structure and predicted oligonucleotide/target binding free energy as 
input parameters. 

N. Milner, ef a/., Nature Biotechnology , 15:537-541 (1997) compare 
observed patterns of probe hybridization to those expected from the predicted 
5 secondary structure of the nucleic acid target. 

L. Wodicka, etal., Nature Biotechnology , 1^:1359-1367 (1997) describe 
simple rules for avoiding inefficient and non-specific probes during design and 
synthesis of oligonucleotides arrays. 

J. SantaLucia Jr., ef a/., Biochemistry . 35:3555 (1996) disclose parameters 
10 and methods for the calculation of thermodynamic properties of DNA/DNA 
homoduplexes. 

N. Sugimoto, et aL, Biochemistry , 34:11211 (1995) disclose parameters 
and methods for the calculation of thermodynamic properties of DNA/RNA 
heteroduplexes. 

15 J.A. Jaeger, et aL, Proc. Natl. Acad. Sci. USA , 86:7706 (1989) disclose 

methods for estimation of the free energy of the most stable intramolecular 
structure of a single-stranded polynucleotide, by means of a dynamic 
programming algorithm. 

S. F. Altschul, ef a/.. Nature Genetics , 6:119-129 (1994) disclose methods 

20 for calculating the complexity and information content of amino acid and nucleic 
acid sequences. 

T. A. Weber and E. Helfand, J. Chem. Phys. , 71, 4760 (1979) describe 
approaches for the modeling of polymer structures by molecular dynamics 
simulations. 

25 V. Patzel and G. Sczakiel, Nature Biotech. , 16, 64-68 (1998) disclose 

methods for estimating rate constants for association of antisense RNA molecules 
with mRNA targets by examination of predicted antisense RNA secondary 
structures. 

Light-generated oligonucleotide arrays for rapid DNA sequence analysis is 
30 described by A. C. Pease, et ai, Proc. Nat. Acad. Sci. USA (1994) 91 :5022-5026. 

Mitsuhashi discusses basic requirements for designing optimal 
oligonucleotide probe sequences in J. Clinical Laboratory Analysis (1996) 10:277- 
284. 
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Rychlik, et aL, discloses a computer program for choosing optimal 
oligonucleotides for filter hybridization, sequencing and in vitro amplification of 
DNA in Nucleic Acids Research (1989) 17(21 ):8543-8551 . 

A strategy for designing specific antisense oligonucleotide sequences is 
5 described by Mitsuhashi in J. Gastroenterol . (1997) 32:282-287. 

Mitsuhashi discusses basic requirements for designing optimal PGR 
primers in J. Clinical Laboratory Analysis (1996) 10:285-293. 

Hyndman, et ai, disclose software to determine optimal oligonucleotide 
sequences based on hybridization simulation data in BioTechniques (1996) 
10 20(6): 1090-1 094. 

Eberhardt discloses a shell program for the design of PGR primers using 
O genetics computer group (GGG) software (7.1) on VAXA/MS™ systems in 

5 BioTechniques (1992) 13(6):914-917. 

Ghen, et ai, disclose a computer program for calculating the melting 
ffl 15 temperature of degenerate oligonucleotides used in PGR or hybridization in 
J BioTechniques (1997) 22(6):11 58-11 60. 

L, Partial thermodynamic parameters for prediction stability and washing 

fy behavior of DNA duplexes immobilized on gel matrix is described by Kunitsyn, et 

aA, in J. Biomolecular Structure & Dynamics, ISSN 0739-1102 (1996) 14(1):239- 
O 20 244. 

SUMMARY OF THE INVENTION 
One embodiment of the present invention is a method for predicting the 
potential of an oligonucleotide to hybridize to a target nucleotide sequence. A 

25 predetermined set of unique oligonucleotide sequences is identified. The unique 
oligonucleotide sequences are chosen to sample the entire length of a nucleotide 
sequence that is hybridizable with the target nucleotide sequence. At least one 
parameter that is predictive of the ability of each of the oligonucleotides specified 
by the set of sequences to hybridize to the target nucleotide sequence is 

30 determined and evaluated for each of the above oligonucleotide sequences. A 
subset of oligonucleotide sequences within the predetermined set of unique 
oligonucleotide sequences is identified based on the examination of the parameter 
values. Finally, oligonucleotide sequences in the subset are identified that are 
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clustered along one or more regions of the nucleotide sequence that is 
hybridizable to the target nucleotide sequence. The oligonucleotide probes 
corresponding to the identified sequences find use in polynucleotide assays 
particularly where the assays involve oligonucleotide arrays. For a discussion of 
5 oligonucleotide arrays, see, e.g., U.S. Patent No. 5,700,637 (E. Southern) and 
U.S. Patent No. 5,667,667 (E. Southern), the relevant disclosures of which are 
incorporated herein by reference. 

Another embodiment of the present invention is a method for predicting the 
potential of an oligonucleotide to hybridize to a complementary target nucleotide 

10 sequence. A set of overlapping oligonucleotide sequences is identified based on a 
nucleotide sequence that is complementary to the target nucleotide sequence. At 
least two parameters that are independently predictive of the ability of each of the 
oligonucleotides specified by the oligonucleotide sequences to hybridize to the 
target nucleotide sequence are determined and evaluated for each of the 

15 oligonucleotide sequences. Independence is assured by requiring that the 
parameters be poorly correlated with respect to one another. A subset of 
oligonucleotide sequences within the set of oligonucleotide sequences is identified 
based on the examination of the parameter values. Finally, oligonucleotide 
sequences in the subset are identified that are clustered along one or more 

20 regions of the nucleotide sequence that is complementary to the target nucleotide 
sequence. 

Another embodiment of the present invention is a method for predicting the 
potential of an oligonucleotide to hybridize to a complementary target nucleotide 
sequence. A set of overlapping oligonucleotide sequences is obtained based on a 

25 nucleotide sequence of length L, complementary to the target nucleotide 
sequence. The oligonucleotide sequences of the set of overlapping 
oligonucleotide sequences are of identical length N and spaced one nucleotide 
apart. The set comprises L-N+1 oligonucleotide sequences. Parameters are 
determined for each of the oligonucleotide sequences of the set of overlapping 

30 oligonucleotide sequences. One parameter is the predicted melting temperature 
of the duplex of each of the oligonucleotides specified by the oligonucleotide 
sequences and the target nucleotide sequence, corrected for salt concentration. 
The other parameter is the predicted free energy of the most stable intramolecular 
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structure of each of the oligonucleotides specified by the oligonucleotide 
sequences at the temperature of hybridization of the oligonucleotide with the 
target nucleotide sequence. A subset of oligonucleotide sequences within the set 
of oligonucleotide sequences is selected based on an examination of the 
5 parameter values by establishing cut-off values for each of the parameters. 
Oligonucleotide sequences in the subset that are clustered along one or more 
regions of the complementary nucleotide sequence are ranked based on the sizes 
of the clusters of oligonucleotide sequences. Finally, a subset of the clustered 
oligonucleotide sequences is selected that statistically samples the clusters of 

10 oligonucleotide sequences. The selected sampled subset is used to specify the 
synthesis of oligonucleotides for experimental evaluation. 

Another aspect of the present invention is a computer based method for 
predicting the potential of an oligonucleotide to hybridize to a target nucleotide 
sequence. A predetermined number of unique oligonucleotides within a 

15 nucleotide sequence that is hybridizable with the target nucleotide sequence is 

identified under computer control. The oligonucleotides are chosen to sample the 
entire length of the nucleotide sequence. A value is determined and evaluated 
under computer control for each of the oligonucleotides for at least one parameter 
that is independently predictive of the ability of each of the oligonucleotides to 

20 hybridize to the target nucleotide sequence. The parameter values are stored. A 
subset of oligonucleotides within the predetermined number of unique 
oligonucleotides is identified by examination of the stored parameter values under 
computer control. Then, oligonucleotides in the subset that are clustered along a 
region of the nucleotide sequence that is hybridizable to the target nucleotide 

25 sequence are identified under computer control. 

Another aspect of the present invention is a computer system for 
conducting a method for predicting the potential of an oligonucleotide to hybridize 
to a target nucleotide sequence. The system comprises (a) input means for 
introducing a target nucleotide sequence into the computer system, (b) means for 

30 determining a number of unique oligonucleotide sequences that are within a 
nucleotide sequence that is hybridizable with the target nucleotide sequence 
where the oligonucleotide sequences are chosen to sample the entire length of 
the nucleotide sequence, (c) memory means for storing the oligonucleotide 



Attorney Docket No. 10971464-1 



15 



-12- 



4 



sequences, (d) means for controlling the computer system to carry out for each of 
the oligonucleotide sequences a determination and evaluation of a value for at 
least one parameter that is independently predictive of the ability of each of the 
oligonucleotide sequences to hybridize to the target nucleotide sequence, (e) 
5 means for storing the parameter values, (f) means for controlling the computer to 
carry out an identification from the stored parameter values a subset of 
oligonucleotide sequences within the number of unique oligonucleotide sequences 
based on the examination of the parameter, (g) means for storing the subset of 
oligonucleotides, (h) means for controlling the computer to carry out an 
10 identification of oligonucleotide sequences in the subset that are clustered along a 
region of the nucleotide sequence that is hybridizable to the target nucleotide 
sequence, (i) means for storing the oligonucleotide sequences in the subset, and 
(j) means for outputting data relating to the oligonucleotide sequences in the 
subset. 



BRIEF DESCRIPTION OF THE DRAWINGS 



Fig.1 is a general flow chart depicting the method of the present 
invention. 

Fig. 2 is a flow chart depicting a preferred embodiment of a method in 
20 accordance with the present invention. 

Fig. 3 is a contour plot of normalized hybridization intensity from multiple 
experiments, as a function of the free energy of the most stable probe 
intramolecular structure (AGmfold) and the difference between the predicted 
RNA/DNA heteroduplex melting temperature (Tm) and the temperature of 
25 hybridization (Thyb). 

Fig. 4 shows the observed hybridization patterns for oligonucleotides 
selected using a method in accordance with the present invention and additional 
oligonucleotides to a portion of the rabbit p-globin gene (radiolabeled antisense 
RNA target). 

30 Fig, 5 shows the observed hybridization patterns for oligonucleotides 

selected using a method in accordance with the present invention and additional 
oligonucleotides to the HIV PRT gene (fluorescein-labeled sense RNA target). 
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Fig. 6 shows the observed hybridization patterns for oligonucleotides 
selected using a method in accordance with the present invention and additional 
oligonucleotides to the G3PDH gene (fluorescein-labeled antisense RNA target). 

Fig. 7 shows the observed hybridization patterns for oligonucleotides 
5 selected using a method in accordance with the present invention and additional 
oligonucleotides to the p53 gene (fluorescein-labeled antisense RNA target). 

Fig. 8 shows the observed hybridization patterns for oligonucleotides 
selected using a method in accordance with the present invention and additional 
oligonucleotides to the HIV PRTs gene (using data from the GeneChip™ data). 

10 

DEFINITIONS 

Before proceeding further with a description of the specific embodiments of 
the present invention, a number of terms will be defined. 



15 Nucleic Acids : 

Polynucleotide - a compound or composition that is a polymeric nucleotide 
or nucleic acid polymer. The polynucleotide may be a natural compound or a 
synthetic compound. In the context of an assay, the polynucleotide is often 
referred to as a polynucleotide analyte. The polynucleotide can have from about 

20 20 to 5,000,000 or more nucleotides. The larger polynucleotides are generally 
found in the natural state. In an isolated state the polynucleotide can have about 
30 to 50,000 or more nucleotides, usually about 100 to 20,000 nucleotides, more 
frequently 500 to 10,000 nucleotides. It is thus obvious that isolation of a 
polynucleotide from the natural state often results in fragmentation. The 

25 polynucleotides include nucleic acids, and fragments thereof, from any source in 
purified or unpurified form including DNA (dsDNA and ssDNA) and RNA, including 
tRNA, mRNA, rRNA, mitochondrial DNA and RNA, chloroplast DNA and RNA, 
DNA/RNA hybrids, or mixtures thereof, genes, chromosomes, plasmids, the 
genomes of biological material such as microorganisms, e.g., bacteria, yeasts, 

30 viruses, viroids, molds, fungi, plants, animals, humans, and the like. The 
polynucleotide can be only a minor fraction of a complex mixture such as a 
biological sample. Also included are genes, such as hemoglobin gene for sickle- 
cell anemia, cystic fibrosis gene, oncogenes, cDNA, and the like. 



Attorney Docket No. 10971464-1 




The polynucleotide can be obtained from various biological materials by 
procedures well known in the art. The polynucleotide, where appropriate, may be 
cleaved to obtain a fragment that contains a target nucleotide sequence, for 
example, by shearing or by treatment with a restriction endonuclease or other site 
5 specific chemical cleavage method. 

For purposes of this invention, the polynucleotide, or a cleaved fragment 
obtained from the polynucleotide, will usually be at least partially denatured or 
single stranded or treated to render it denatured or single stranded. Such 
treatments are well known in the art and include, for instance, heat or alkali 
10 treatment, or enzymatic digestion of one strand. For example, dsDNA can be 
heated at 90-100° C. for a period of about 1 to 10 minutes to produce denatured 
material. 

Target nucleotide sequence - a sequence of nucleotides to be identified, 
usually existing within a portion or all of a polynucleotide, usually a polynucleotide 

1 5 analyte. The identity of the target nucleotide sequence generally is known to an 
extent sufficient to allow preparation of various sequences hybridizable with the 
target nucleotide sequence and of oligonucleotides, such as probes and primers, 
and other molecules necessary for conducting methods in accordance with the 
present invention, an amplification of the target polynucleotide, and so forth. 

20 The target sequence usually contains from about 30 to 5,000 or more 

nucleotides, preferably 50 to 1 ,000 nucleotides. The target nucleotide sequence 
is generally a fraction of a larger molecule or it may be substantially the entire 
molecule such as a polynucleotide as described above. The minimum number of 
nucleotides in the target nucleotide sequence is selected to assure that the 

25 presence of a target polynucleotide in a sample is a specific indicator of the 

presence of polynucleotide in a sample. The maximum number of nucleotides in 
the target nucleotide sequence is normally governed by several factors: the length 
of the polynucleotide from which it is derived, the tendency of such polynucleotide 
to be broken by shearing or other processes during isolation, the efficiency of any 

30 procedures required to prepare the sample for analysis (e.g. transcription of a 

DNA template into RNA) and the efficiency of detection and/or amplification of the 
target nucleotide sequence, where appropriate. 
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Oligonucleotide a polynucleotide, usually single stranded, usually a 
synthetic polynucleotide but may be a naturally occurring polynucleotide. The 
oligonucleotide(s) are usually connprised of a sequence of at least 5 nucleotides, 
preferably, 10 to 100 nucleotides, more preferably, 20 to 50 nucleotides, and 
5 usually 10 to 30 nucleotides, more preferably, 20 to 30 nucleotides, and desirably 
about 25 nucleotides in length. 

Various techniques can be employed for preparing an oligonucleotide. 
Such oligonucleotides can be obtained by biological synthesis or by chemical 
synthesis. For short sequences (up to about 100 nucleotides), chemical synthesis 

10 will frequently be more economical as compared to the biological synthesis. In 
addition to economy, chemical synthesis provides a convenient way of 
incorporating low molecular weight compounds and/or modified bases during 
specific synthesis steps. Furthermore, chemical synthesis is very flexible in the 
choice of length and region of the target polynucleotide binding sequence. The 

15 oligonucleotide can be synthesized by standard methods such as those used in 
commercial automated nucleic acid synthesizers. Chemical synthesis of DNA on 
a suitably modified glass or resin can result in DNA covalently attached to the 
surface. This may offer advantages in washing and sample handling. For longer 
sequences standard replication methods employed in molecular biology can be 

20 used such as the use of Ml 3 for single stranded DNA as described by J. Messing 
(1983) Methods Enzymol , 101:20-78. 

Other methods of oligonucleotide synthesis include phosphotriester and 
phosphodiester methods (Narang, et ai (1979) Meth. Enzymol 68:90) and 
synthesis on a support (Beaucage, etal. (1981) Tetrahedron Letters 

25 22:1859-1862) as well as phosphoramidite techniques (Caruthers, M. H., ef a/., 
"Methods in Enzymology," Vol. 154, pp. 287-314 (1988)) and others described in 
"Synthesis and Applications of DNA and RNA," S.A. Narang, editor. Academic 
Press, New York, 1987, and the references contained therein. The chemical 
synthesis via a photolithographic method of spatially addressable arrays of 

30 oligonucleotides bound to glass surfaces is described by A. C. Pease, ef a/., Proc. 
Nat. Acad. Sci. USA (1994) 91:5022-5026. 

Oligonucleotide probe - an oligonucleotide employed to bind to a portion of 
a polynucleotide such as another oligonucleotide or a target nucleotide sequence. 
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The design and preparation of the oligonucleotide probes are generally 
dependent upon the sensitivity and specificity required, the sequence of the target 
polynucleotide and, in certain cases, the biological significance of certain portions 
of the target potynucleotide sequence. 



chain extension on a polynucleotide template such as in, for example, an 
amplification of a nucleic acid. The oligonucleotide primer is usually a synthetic 
nucleotide that is single stranded, containing a sequence at its 3'-end that is 
capable of hybridizing with a defined sequence of the target polynucleotide. 
10 Normally, an oligonucleotide primer has at least 80%, preferably 90%, more 

preferably 95%, most preferably 100%, complementarity to a defined sequence or 
primer binding site. The number of nucleotides in the hybridizable sequence of an 
oligonucleotide primer should be such that stringency conditions used to hybridize 
the oligonucleotide primer will prevent excessive random non-specific 



PI 1 5 hybridization. Usually, the number of nucleotides in the oligonucleotide primer will 
^ be at least as great as the defined sequence of the target polynucleotide, namely, 

L at least ten nucleotides, preferably at least 15 nucleotides, and generally from 

fU about 10 to 200, preferably 20 to 50, nucleotides. 

in In general, in primer extension, amplification primers hybridize to, and are 

2 20 extended along (chain extended), at least the target nucleotide sequence within 
the target polynucleotide and, thus, the target sequence acts as a template. The 
extended primers are chain "extension products." The target sequence usually 
lies between two defined sequences but need not In general, the primers 
hybridize with the defined sequences or with at least a portion of such target 
25 polynucleotide, usually at least a ten-nucleotide segment at the 3'-end thereof and 
preferably at least 15, frequently a 20 to 50 nucleotide segment thereof 
Nucleoside triphosphates nucleosides having a 5 -triphosphate 
substituent. The nucleosides are pentose sugar derivatives of nitrogenous bases 
of either purine or pyrimidine derivation, covalently bonded to the r-carbon of the 
30 pentose sugar, which is usually a deoxyribose or a ribose. The purine bases 

include adenine (A), guanine (G), inosine (I), and derivatives and analogs thereof 
The pyrimidine bases include cytosine (C), thymine (T), uracil (U), and derivatives 
and analogs thereof Nucleoside triphosphates include deoxyribonucleoside 



5 



Oligonucleotide primer(s) an oligonucleotide that is usually employed in a 
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triphosphates such as the four common deoxyribonucleoside triphosphates dATP, 
dCTP, dGTP and dTTP and ribonucleoside triphosphates such as the four 
common triphosphates rATP, rCTP, rGTP and rUTP. 

The term "nucleoside triphosphates" also includes derivatives and analogs 
5 thereof, which are exemplified by those derivatives that are recognized and 
polymerized in a similar manner to the underivatized nucleoside triphosphates. 

Nucleotide a base-sugar-phosphate combination that is the monomeric 
unit of nucleic acid polymers, i.e., DNA and RNA. The term "nucleotide" as used 
herein includes modified nucleotides as defined below. 
1 0 DNA ~ deoxyribonucleic acid. 

RNA ~ ribonucleic acid. 

Modified nucleotide ~ a unit in a nucleic acid polymer that contains a 
modified base, sugar or phosphate group. The modified nucleotide can be 
produced by a chemical modification of the nucleotide either as part of the nucleic 

1 5 acid polymer or prior to the incorporation of the modified nucleotide into the 
nucleic acid polymer. For example, the methods mentioned above for the 
synthesis of an oligonucleotide may be employed. In another approach a 
modified nucleotide can be produced by incorporating a modified nucleoside 
triphosphate into the polymer chain during an amplification reaction. Examples of 

20 modified nucleotides, by way of illustration and not limitation, include 

dideoxynucleotides, derivatives or analogs that are biotinylated, amine modified, 
alkylated, fluorophore-labeled, and the like and also include phosphorothioate, 
phosphite, ring atom modified derivatives, and so forth. 

Nucleoside ~ is a base-sugar combination or a nucleotide lacking a 

25 phosphate moiety. 

Nucleotide polymerase ~ a catalyst, usually an enzyme, for forming an 
extension of a polynucleotide along a DNA or RNA template where the extension 
is complementary thereto. The nucleotide polymerase is a template dependent 
polynucleotide polymerase and utilizes nucleoside triphosphates as building 

30 blocks for extending the 3'-end of a polynucleotide to provide a sequence 
complementary with the polynucleotide template. Usually, the catalysts are 
enzymes, such as DNA polymerases, for example, prokaryotic DNA polymerase 
(I, II, or III). T4 DNA polymerase, T7 DNA polymerase, Klenow fragment, reverse 
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transcriptase, Vent DNA polymerase, Pfu DNA polymerase, Tag DNA 
polymerase, and the like, or RNA polymerases, such as T3 and T7 RNA 
polymerases. Polymerase enzymes may be derived from any source such as 
cells, bacteria such as E. coli, plants, animals, virus, thermophilic bacteria, and so 



Amplification of nucleic acids or polynucleotides — any method that results 
in the formation of one or more copies of a nucleic acid or polynucleotide 
molecule (exponential amplification) or in the formation of one or more copies of 
only the complement of a nucleic acid or polynucleotide molecule (linear 

10 amplification). 

Hybridization (hybridizing) and binding - in the context of nucleotide 
sequences these terms are used interchangeably herein. The ability of two 
nucleotide sequences to hybridize with each other is based on the degree of 
complementarity of the two nucleotide sequences, which in turn is based on the 

15 fraction of matched complementary nucleotide pairs. The more nucleotides in a 
given sequence that are complementary to another sequence, the more stringent 
the conditions can be for hybridization and the more specific will be the binding of 
the two sequences. Increased stringency is achieved by elevating the 
temperature, increasing the ratio of co-solvents, lowering the salt concentration, 

20 and the like. 

Hybridization efficiency ~ the productivity of a hybridization reaction, 
measured as either the absolute or relative yield of oligonucleotide 
probe/polynucleotide target duplex formed under a given set of conditions in a 
given amount of time. 

25 Homologous or substantially identical polynucleotides In general, two 

polynucleotide sequences that are identical or can each hybridize to the same 
polynucleotide sequence are homologous. The two sequences are homologous 
or substantially identical where the sequences each have at least 90%, preferably 
100%, of the same or analogous base sequence where thymine (T) and 

30 uracil (U) are considered the same. Thus, the ribonucleotides A, U, C and G are 
taken as analogous to the deoxynucleotides dA, dT, dC, and dG, respectively. 
Homologous sequences can both be DNA or one can be DNA and the other RNA. 



5 



forth. 
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Complementary -- Two sequences are complementary when the sequence 
of one can bind to the sequence of the other in an anti-parallel sense wherein the 
3*-end of each sequence binds to the 5'-end of the other sequence and each A, 
T(U), G, and C of one sequence is then aligned with a T(U), A, C, and G, 
5 respectively, of the other sequence. RNA sequences can also include 
complementary G/U or U/G basepairs. 

Member of a specific binding pair ("sbp member") one of two different 
molecules, having an area on the surface or in a cavity that specifically binds to 
and is thereby defined as complementary with a particular spatial and polar 

1 0 organization of the other molecule. The members of the specific binding pair are 
referred to as cognates or as ligand and receptor (antiligand). These may be 
members of an immunological pair such as antigen-antibody, or may be 
operator-represser, nuclease-nucleotide, biotin-avidin. hormones-hormone 
receptors, nucleic acid duplexes, IgG-protein A, DNA-DNA, DNA-RNA, and the 

15 like. 

Ligand ~ any compound for which a receptor naturally exists or can be 
prepared. 

Receptor ("antiligand") ~ any compound or composition capable of 
recognizing a particular spatial and polar organization of a molecule, e.g., epitopic 
20 or determinant site. Illustrative receptors include naturally occurring receptors, 
e.g., thyroxine binding globulin, antibodies, enzymes, Fab fragments, lectins, 
nucleic acids, repressors, protection enzymes, protein A, complement component 
Clq, DNA binding proteins or ligands and the like. 



25 Oligonucleotide Properties : 

Potential of an oligonucleotide to hybridize ~ the combination of duplex 
formation rate and duplex dissociation rate that determines the amount of duplex 
nucleic acid hybrid that will form under a given set of experimental conditions in a 
given amount of time. 

30 Parameter ~ a factor that provides information about the hybridization of an 

oligonucleotide with a target nucleotide sequence. Generally, the factor is one 
that is predictive of the ability of an oligonucleotide to hybridize with a target 
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nucleotide sequence. Such factors include composition factors, thermodynamic 
factors, chemosynthetic efficiencies, kinetic factors, and the like. 

Parameter predictive of the ability to hybridize -- a parameter calculated 
from a set of oligonucleotide sequences wherein the parameter positively 
5 correlates with observed hybridization efficiencies of those sequences. The 
parameter is, therefore, predictive of the ability of those sequences to hybridize. 
"Positive correlation" can be rigorously defined in statistical terms. The correlation 
coefficient px,y of two experimentally measured discreet quantities x and y (N 
values in each set) is defined as 



The quantities and //y are the averages of the quantities x and y, while the 
variances are simply the squares of the standard deviations (defined below). The 
correlation coefficient is a dimensionless (unitless) quantity between -1 and 1. A 

20 correlation coefficient of 1 or -1 indicates that x and y have a linear relationship 
with a positive or negative slope, respectively. A correlation coefficient of zero 
indicates no relationship; for example, two sets of random numbers will yield a 
correlation coefficient near zero. Intermediate correlation coefficients indicate 
intermediate degrees of relatedness between two sets of numbers. The 

25 correlation coefficient is a good statistical measure of the degree to which one set 
of numbers predicts a second set of numbers. 

Composition factor — a numerical factor based solely on the composition or 
sequence of an oligonucleotide without involving additional parameters, such as 
experimentally measured nearest-neighbor thermodynamic parameters. For 

30 instance, the fraction (G+C), given by the formula 



Covariance(jc, y) 



^Variance(jc)Variance(3/) ' 



where the Covariance (x,y) is defined by 
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where Hg, hc, Ha and nroru are the numbers of G, C, A and T (or U) bases in an 
5 oligonucleotide, is an exanriple of a composition factor. Examples of composition 
factors, by way of illustration and not limitation, are mole fraction (G+C), percent 
(G+C), sequence complexity, sequence information content, frequency of 
occurrence of specific oligonucleotide sequences in a sequence database and so 
forth. 

1 0 Thermodynamic factor — numerical factors that predict the behavior of an 

oligonucleotide in some process that has reached equilibrium. For instance, the 
free energy of duplex formation between an oligonucleotide and its complement is 
a thermodynamic factor. Thermodynamic factors for systems that can be 
subdivided into constituent parts are often estimated by summing contributions 

15 from the constituent parts. Such an approach is used to calculate the 
thermodynamic properties of oligonucleotides. 

Examples of thermodynamic factors, by way of illustration and not 
limitation, are predicted duplex melting temperature, predicted enthalpy of duplex 
formation, predicted entropy of duplex formation, free energy of duplex formation, 

20 predicted melting temperature of the most stable intramolecular structure of the 
oligonucleotide or its complement, predicted enthalpy of the most stable 
intramolecular structure of the oligonucleotide or its complement, predicted 
entropy of the most stable intramolecular structure of the oligonucleotide or its 
complement, predicted free energy of the most stable intramolecular structure of 

25 the oligonucleotide or its complement, predicted melting temperature of the most 
stable hairpin structure of the oligonucleotide or its complement, predicted 
enthalpy of the most stable hairpin structure of the oligonucleotide or its 
complement, predicted entropy of the most stable hairpin structure of the 
oligonucleotide or its complement, predicted free energy of the most stable hairpin 

30 structure of the oligonucleotide or its complement, thermodynamic partition 

function for intramolecular structure of the oligonucleotide or its complement and 
the like. 
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Chemosynthetic efficiency - oligonucleotides and nucleotide sequences 
may both be made by sequential polymerization of the constituent nucleotides. 
However, the individual addition steps are not perfect; they instead proceed with 
some fractional efficiency that is less than unity. This may vary as a function of 
5 position in the sequence. Therefore, what is really produced is a family of 

molecules that consists of the desired molecule plus many truncated sequences. 
These "failure sequences" affect the observed efficiency of hybridization between 
an oligonucleotide and its complementary target. Examples of chemosynthetic 
efficiency factors, by way of illustration and not limitation, are coupling efficiencies, 

10 overall efficiencies of the synthesis of a target nucleotide sequence or an 
oligonucleotide probe, and so forth. 

Kinetic factor - numerical factors that predict the rate at which an 
oligonucleotide hybridizes to its complementary sequence or the rate at which the 
hybridized sequence dissociates from its complement are called kinetic factors. 

1 5 Examples of kinetic factors are steric factors calculated via molecular modeling or 
measured experimentally, rate constants calculated via molecular dynamics 
simulations, associative rate constants, dissociative rate constants, enthalpies of 
activation, entropies of activation, free energies of activation, and the like. 
Predicted duplex melting temperature -- the temperature at which an 

20 oligonucleotide mixed with a hybridizable nucleotide sequence is predicted to form 
a duplex structure (double-helix hybrid) with 50% of the hybridizable sequence. At 
higher temperatures, the amount of duplex is less than 50%; at lower 
temperatures, the amount of duplex is greater than 50%. The melting 
temperature Tm C*C) is calculated from the enthalpy (AH), entropy (AS) and C, the 

25 concentration of the most abundant duplex component (for hybridization arrays, 
the soluble hybridization target), using the equation 

^ 273.15, 

30 where R is the gas constant, 1.987 cal/(mole-°K). For longer sequences (>100 
nucleotides), Tm can also be estimated from the mole fraction (G+C), xg+c, using 
the equation 
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7:„ =81.5 + 41.0;^, 



Melting temperature corrected for salt concentration ~ polynucleotide 
duplex melting temperatures are calculated with the assumption that the 
5 concentration of sodium ion, Na"*", is 1 M. Melting temperatures T'm calculated for 
duplexes formed at different salt concentrations are corrected via the semi- 
empirical equation 



Predicted enthalpy, entropy and free energy of duplex formation — the 
enthalpy (AH), entropy and free energy (AG) are thermodynamic state functions, 
related by the equation 

15 AG = AH-TAS . 

where T is the temperature in °K. In practice, the enthalpy and entropy are 
predicted via a thermodynamic model of duplex formation (the "nearest neighbor" 
model which is explained in more detail below), and used to calculate the free 
energy and melting temperature. 

20 Predicted free energy of the most stable intramolecular structure of an 

oligonucleotide or its complement - single-stranded DNA and RNA molecules that 
contain self-complementary sequences can form intramolecular secondary 
structures. For instance, the oligonucleotide 

25 5' -ACTGGCAATCACAATTGCCAGTAA-3' ( SEQ ID NO:1) 

can base pair with itself, to form the structure 

5' -ACTGGCAATCA 
30 I I I I I I I I I C (SEQ ID NO:1) 

3' -AATGACCGTTAA 



r:([7Va^])=r„,-hl6.6 log([A^a^]). 



10 
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where a vertical line indicates Watson-Crick base pair formation. Many such 
structures are possible for a given sequence; two are of particular interest. The 
first is the lowest energy "hairpin" structure (formed by folding a sequence back on 
itself with a connecting loop at least 3 nucleotides long). The second is the lowest 
5 energy structure that can be formed by including more complex topologies, such 
as "bulge loops" (unpaired duplexes between two regions of base-paired duplex) 
and cloverteaf structures, where 3 base-paired stretches meet at a triple-junction. 
A good example of a complex secondary structure is the structure of a tRNA 
molecule, an example of which, namely, yeast tRNA^*^ is shown below. 
1 0 For either type of structure, a value of the free energy of that structure can 

be calculated, relative to the unpaired strand, by means of a thermodynamic 
model similar to that used to calculate the free energy of a base-paired duplex 
structure. Again, the free energy AG is calculated from the enthalpy AH and the 
entropy AS at a given absolute temperature T via the equation 

15 

AG = A//-rA5. 



However, in this case there is the added difficulty that the lowest energy structure 

must be found. For a simple hairpin structure, this optimization can be performed 
20 via a relatively simple search algorithm. For more complex structures (such as a 

cloverteaf) a dynamic programming algorithm, such as that implemented in the 

program MFOLD, must be used. 

Yeast tRNA^'^ - The RNA sequence includes many non-standard 

ribonucleotides, such as D (5,6 dihydrouridine), m^G (1-methylguanosine), m^G 
25 (N^-dimethylguanosine), v|/ (pseudouridine), I (inosine), mM (1-methylinosine) and 

T (ribothymidine). Dots (-) mark (non-standard) G=U base pairs. The structure is 

taken from A. L. Lehninger, et aL, Principles of Biochemistry , 2""* Ed. (Worth 

Publishers, New York, NY, 1993). 
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15 C AUGCGm^G Mill G (SEQ ID NO:2) 

- I II UCCGG C 

G AGCGC C Tm/ 

GD m^G D 

O C-GAG 

m 20 u-A 

SJ C-G 

m c-G 

=p c-G 
U\ U V|/ 

N 25 U m^I 

^ I c 

a G 

Ll Coupling efficiencies -- chemosynthetic efficiencies are called coupling 

H 30 efficiencies when the synthetic scheme involves successive attachment of 
C different monomers to a growing oligomer; a good example is oligonucleotide 

synthesis via phosphoramidite coupling chemistry. 



Algorithmic Operations : 
35 Evaluating a parameter determination of the numerical value of a 

numerical descriptor of a property of an oligonucleotide sequence by means of a 

formula, algorithm or look-up table. 

Filter a mathematical rule or formula that divides a set of numbers into 

two subsets. Generally, one subset is retained for further analysis while the other 
40 is discarded. If the division into two subsets is achieved by testing the numbers 

against a simple inequality, then the filter is referred to as a "cut-off\ In the 

context of the current invention, an example by way of illustration and not 

limitation is the statement "The predicted self structure free energy must be 



Attorney Docket No. 10971464-1 



30 



-26- 



greater than or equal to -0.4 kcal/mole," which can be used as a filter for 
oligonucleotide sequences; this particular filter is also an example of a cut-off. 

Filter set A set of rules or formulae that successively winnow a set of 
numbers by identifying and discarding subsets that do not meet specific criteria. In 
5 the context of the current invention, an example by way of illustration and not 

limitation is the compound statement "the predicted self structure free energy must 
be greater than or equal to -0.4 kcal/mole and the predicted RNA/DNA 
heteroduplex melting temperature must lie between 60°C and 85X," which can 
be used as a filter set for oligonucleotide sequences. 

10 Examining a parameter comparing the numerical value of a parameter to 

some cutoff-value or filter. 

Statistical sampling of a cluster ~ extraction of a subset of oligonucleotides 
from a cluster of oligonucleotides based upon some statistical measure, such as 
rank by oligonucleotide starting position in the sequence complementary to the 

15 target sequence. 

First quartile, median and third quartile - If a set of numbers is ranked by 
value, then the value that divides the lower Va from the upper % of the set is the 
first quartile, the value that divides the set in half is the median and the value that 
divides the lower % from the upper % of the set is the third quartile. 

20 Poorly correlated ~ If it is not possible to perform a "good" prediction, as 

defined via statistics, of one set of numbers from another set of numbers using a 
simple linear model, then the two sets of numbers are said to be poorly correlated. 

Computer program ~ a written set of instructions that symbolically instructs 
an appropriately configured computer to execute an algorithm that will yield 

25 desired outputs from some set of inputs. The instructions may be written in one 
or several standard programming languages, such as C, C++, Visual BASIC, 
FORTRAN or the like. Alternatively, the instructions may be written by imposing a 
template onto a general-purpose numerical analysis program, such as a 
spreadsheet. 



Experimental System Components : 

Small organic molecule ~ a compound of molecular weight less than 1500, 
preferably 100 to 1000, more preferably 300 to 600 such as biotin, fluorescein. 
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achieve a consensus behavior. In other words, the oligonucleotide sequences 
should be sufficiently numerous that several possible probes overlap or fall within 
a given region that is expected to yield acceptable hybridization efficiency. Since 
the location of these regions is not known before hand, the best strategy is to 
5 equally space the probe sequences along the sequence that is hybridizable to the 
target sequence. Since regions of acceptable hybridization efficiency are 
generally on the order of 20 nucleotides in length, a practical strategy is to space 
the starting nucleotides of the oligonucleotide sequences no more than five 
basepairs apart. If computation time needed to calculate the predictive 

10 parameters is not an issue, then the best strategy is to space the starting 

nucleotides one nucleotide apart. An important feature of the present invention is 
to determine oligonucleotides that are clustered along a region of the nucleotide 
sequence. The individual predictions made for individual oligonucleotide 
sequences are not very good. However, we have found that the predictions that 

15 are experimentally observed tend to form contiguous clusters, while the spurious 
predictions tend to be solitary. Thus, the number of oligonucleotides should be 
sufficient to achieve the desired clustering. 

Preferably, a set of overlapping sequences is chosen. To this end, the 
subsequences are chosen so that there is overlap of at least one nucleotide from 

20 one oligonucleotide to the next. More preferably, the overlap is two or more 
nucleotides. Most preferably, the oligonucleotides are spaced one nucleotide 
apart and the predetermined number is L-N+1 oligonucleotides where L is the 
length of the nucleotide sequence and N is the length of the oligonucleotides. In 
the latter situation, the unique oligonucleotides are of identical length N. Thus, a 

25 set of overlapping oligonucleotides is a set of oligonucleotides that are 

subsequences derived from some master sequence by subdividing that sequence 
in such a way that each subsequence contains either the start or end of at least 
one other subsequence in the set. 
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An example of the above for purposes of illustration and not limitation is 

presented by the sequence atggacttagcattcg (SEQ ID NO:3), from which the 

following set of overlapping oligonucleotides can be identified: 

ATGGACTTAGCA (SEQ ID NO:4) 
5 TGGACTTAGCAT (SEQ ID NO:5) 

GGACTTAGCATT (SEQ ID NO:6) 
GACTTAGCATTC (SEQ ID NO:7) 
ACTTAGCATTCG (SEQ ID NO:8) 

10 In this example the overlapping oligonucleotides are spaced one nucleotide apart. 
In other words, there is overlap of all but one nucleotide from one oligonucleotide 

4>l-»^ «i-\vrf Ir^ -Qvomrtlia oKr\\/ci thai /^rinirtol ni ir*l(0/^f iHo coniionr-o ic i ft 

\\J II I l^yVL. Ill tl l5S \^yv<3i I Clli-^V^V^, 1.1 I V^ll^iiiui iiuwiwwvi^w wwv^wawiix^w iw tw 

nucleotides long (L=16). The length of each of the overlapping oligonucleotides is 
12 nucleotides long (N=12) and there are L-N+1 = 5 oligonucleotides. 

15 The length of the oligonucleotides may be the same or different and may 

vary depending on the length of the nucleotide sequence. The length of the 
oligonucleotides is determined by a practical compromise between the limits of 
current chemistries for oligonucleotide synthesis and the need for longer 
oligonucleotides, which exhibit greater binding affinity for the target sequence and 

20 are more likely to occur only once in complicated mixtures of polynucleotide 
targets. Usually, the length of the oligonucleotides is from about 10 to 50 
nucleotides, more usually, from about 25 to 35 nucleotides. 

In the next step of the method at least one parameter that is independently 
predictive of the ability of each of the oligonucleotides of the set to hybridize to the 

25 target nucleotide sequence is determined and evaluated for each of the above 
oligonucleotides. Examples of such a parameter, by way of illustration and not 
limitation, is a parameter selected from the group consisting of composition 
factors, thermodynamic factors, chemosynthetic efficiencies, kinetic factors and 
mathematical combinations of these quantities. 

30 The determination of a parameter may be carried out by known methods. 

For example, melting temperature of the oligonucleotide/target duplex may be 
determined using the nearest neighbor method and parameters appropriate for 
the nucleotide acids involved. For DNA/DNA parameters, see J. SantaLucia Jr., 
a/., (1996) Biochemistry , 35:3555. For RNA/DNA parameters, see N. 

35 Sugimoto, a/., (1995) Biochemistry , 34:11211. Briefly, these methods are 



Attorney Docket No. 10971464-1 




based on the observation that the thermodynamics of a nucleic acid duplex can be 
modeled as the sum of a term arising from the entire duplex and a set of terms 
arising from overlapping pairs of nucleotides ("nearest neighbor" model). For a 
discussion of the nearest neighbor see J. SantaLucia Jr., etal., (1996) 
5 Biochemistry , supra, and N. Sugimoto, et al., (1995) Biochemistry , supra. For 
example, the enthalpy AH of the duplex formed by the sequence 

ATGGACTTAGCA (SEQ ID NO:4) 
10 and its perfect complement can be approximated by the equation 

AH = + H,r +H^+ //co- +Ha,+H 

+ H^.j. + Hrr + H^.^ + H + Hac + ^cm • 

In the above equation, the term Hmit is the initiation enthalpy for the entire duplex. 

1 5 while the temis Hat. ■ • • , Hca are the so-called "nearest neighbor" enthalpies. 

Similar equations can be written for the entropy, for the corresponding quantities 
for RNA homoduplexes. or for DNA/RNA heteroduplexes. The free energy can 
then be calculated from the enthalpy, entropy and absolute temperature, as 
described previously. 

20 Predicted free energy of the most stable intramolecular structure of an 

oligonucleotide (AGmfold) may be determined using the nucleic acid folding 
algorithm MFOLD and parameters appropriate for the oligonucleotide, e.g., DNA 
or RNA. For MFOLD, see J.A. Jaeger, et al., (1989), supra. For DNA folding 
parameters, see J. SantaLucia Jr., et al., (1996), supra. Briefly, these methods 

25 operate in two steps. First, a map of all possible compatible intramolecular base 
pairs is made. Second, the global minimum of the free energy of the various 
possible base pairing configurations is found, using the nearest neighbor model to 
estimate the enthalpy and entropy, the user input temperature to complete the 
calculation of free energy, and a dynamic programming algorithm to find the global 

30 minimum. The algorithm is computationally intensive; calculation times scale as 
the third power of the sequence length. 
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The following Table 1 summarizes groups of parameters that are 
independently predictive of the ability of each of the oligonucleotides to hybridize 
to the target nucleotide sequence together with a reference to methods for their 
determination. Parameters within a given group are known or expected to be 
strongly correlated to one another, while parameters in different groups are known 
or expected to be poorly correlated with one another. 



Table 1 



Group 


1^ fit 1 all ic^d 


Source or Reference 


1 


duplex entnaipy, ah 


Ociiilci L.uuica e?i a/., lOoD, ouyimwiw fci i ^^v^ 


duplex entropy, AS 


Santa Lucia etal., 1996; Sugimoto ef a/., 1995 


duplex free energy, AG 


AG=AH-TAS (see text) 


melting temperature, 


(see text) 


mole fraction (or percent) G+C 


self-explanatory 


subsequence duplex enthalpy 


Santa Lucia etal., 1996; Sugimoto ef a/., 1995 


subsequence duplex entropy 


Santa Lucia etal., 1996; Sugimoto etal., 1995 


subsequence duplex free energy 


AG=AH-TAS (see text) 


subsequence duplex Tm 


(see text) 


subsequence duplex mole fraction 
(or percent) G+C 


self-explanatory 


II 


intramolecular enthalpy, AHmfold 


Jaeger e^ a/., 1989; Santa Lucia etal., 1996 


intramolecular entropy, ASmfold 


Jaeger ef a/.. 1989; Santa Lucia ef a/., 1996 


intramolecular free energy, AGmfold 


AG=AH-TAS (see text) 


hairpin enthalpy, AHhairpm 


Jaeger ef a/., 1989; Santa Lucia etsL, 1996 


hairpin entropy. AShaMn 


Jaeger et al., 1989; Santa Lucia et a!., 1996 


hairpin free energy, AGhairpin 


AG=AH-TAS (see text) 


intramolecular partition function, Z 


Z= Z^M-^GlnLo^.^^JRT) 

k stntclures 


111 


sequence complexity 


Altschui etal. 1994 


sequence information content 


Altschul efa/.. 1994 


IV 


steric factors 


molecular modeling or experiment 


molecular dynamic simulation 


Weber & Hefland, 1979 


enthalpy, entropy & free energy of 
activation 


measured experimentally 


association & dissociation rates 


Patzel & Sczakiel, 1998 


V 


oligonucleotide chemosynthetic 
efficiencies 


measured experimentally 


VI 


target synthetic efficiencies 


measured experimentally 



In a next step of the present method, a subset of oligonucleotides within the 
predetermined number of unique oligonucleotides is identified based on the above 
evaluation of the parameter. A number of mathematical approaches may be 
followed to sort the oligonucleotides based on a parameter. In one approach a 
cut-off value is established. The cut-off value is adjustable and can be optimized 



Attorney Docket No. 10971464-1 




relative to one or more training data sets. This is done by first establishing some 
metric for how well a cutoff value is performing; for example, one might use the 
normalized signal observed for each oligonucleotide in the training set. Once 
such a metric is established, the cutoff value can be numerically optimized to 
5 maximize the value of that metric, using optimization algorithms well known to the 
art. Alternatively, the cutoff value can be estimated using graphical methods, by 
graphing the value of the metric as a function of one or more parameters, and 
then establishing cutoff values that bracket the region of the graph where the 
chosen metric exceeds some chosen threshold value. In essence, the cut off 
10 values are chosen so that the rule set used yields training data that maximizes the 
inclusion of oligonucleotides that exhibit good hybridization efficiency and 
minimizes the inclusion of oligonucleotides that exhibit poor hybridization 
efficiency. 

A preferred approach to performing such a graph-based optimization of 

15 filter parameters is shown in Fig. 3. In Fig. 3, hybridization data from several 

different genes have been used to prepare a contour plot of relative hybridization 
intensity as a function of DNA/RNA heteroduplex melting temperature and free 
energy of the most stable intramolecular structure of the probe. Contours are 
shown only for regions for which there are data; the white space outside of the 

20 outermost contour indicates that there are no experimental data for that region. 
The details of how the data were obtained can be found in Example 1 below. A 
summary of the sequences and number of data points employed is shown in 
Table 2 below. The measured hybridization intensities for each data set were 
normalized prior to construction of the contour plot depicted in Fig. 3 by dividing 

25 each observed intensity by the maximum intensity observed for that gene. In 
addition, differences in hybridization salt concentrations and hybridization 
temperatures were accounted for by using the salt concentration-corrected values 
of the melting temperatures and by subtracting the hybridization temperature from 
each predicted melting temperature, respectively. The filter set determined by 

30 examination of Fig. 3 is indicated by both the dotted open box in the figure and by 
the inequalities above the box. 

One way in which such a contour plot may be prepared involves the use of 
an appropriate software application such as Microsoft® Excel® or the like. For 
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example, the cross-tabulation tool may be used in the Microsoft® Excel® 
program. Data is accumulated into rectangular bins that are 0.5 kcal AGmfold 
wide and 2.5°C Tm wide. In each bin the average values of AGmfold. Tm - Thyb. 
and the normalized hybridization intensity are calculated. The data is output to 
the software application DeltaGraph® (Deltapoint, Inc., Monterey, CA) and the 
contour plot is prepared using the tools and instructions provided. 



Table 2 



10 



15 



20 



Target (GenBank 
Accession No.) 


Target 
Strand 


No. Data 
Points 


Thyb 


[NaT 
Correction 


HIV protease-reverse 
transcriptase (PRT)^ 
(M15654) 


Sense 


1,022 


35°C 


.1.4X 


HIV protease-reverse 
transcriptase (PRT)^ 
(M 15654) 


antisense 


1,041 


30X 


.1.4"C 


HIV protease-reverse 
transcriptase (PRT)** 
(M 15654) 


Sense 


88 




-1.4^C 


Human G3PDH 
(glyceraldehyde-3- 
dehydrogenase)" (X01677) 


antisense 


93 


35°C 


-1.4X 


Human p53" (X02469) 


antisense 


93 


35X 


-1.4X 


Rabbit p-globin^ (K03256) 


antisense 


106 


30X 


OX 



^ Data from Affymetrix GeneChip™ Array 

^ Data from biotinylated probes bound to streptavidin-coated microliter wells 

^ Literature data: see N. Milner, K. U. Mir & E. M. Southern (1997) Nature Biotech. 15, 537-541. 

Once the cut-off value is selected, a subset of oligonucleotides having 
parameter values greater than or equal to the cut-off value is identified. This refers 
to the inclusion of oligonucleotides in a subset based on whether the value of a 
predictive parameter satisfies an inequality. 

Examples of identifying a subset of oligonucleotides by establishing cut-off 
values for predictive parameters are as follows: for melting temperature an 
inequality might be 60°C < Tmi for predicted free energy an inequality, preferably, 
might be 

kcal 



mole 



25 



In a variation of the above, both a maximum and a minimum cut-off value 
may be selected. A subset of oligonucleotides is identified whose values fall 
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within the maximum and minimum values, i.e., values greater than or equal to the 
minimum cut-off value and less than or equal to the maximum cut-off value. An 
example of this approach for melting temperature might be the inequality 

60°C<T„,<S5°C. 

With regard to cut off values for Tm the lower limit is most important, and is 
preferably Tm = Thyb. more preferably, Tm = Thyb + 15°C. The upper cutoff is 
important when the sequence region under consideration is unusually rich in G 
and C, and is preferably Tm = Thyb + 40°C. With regard to AGmfold the cutoff 
value is usually greater than or equal to -1 .0 kcal/mole. As mentioned above, the 

. _■ r ui. . ^^*^^,rr,\w^ari fmm rool Hpta throiinh evnArimpnta! 

CUtOTT values preieiauiy ciic uciciiiiiiiou iiv^iii iv^t-i ^a^^ .... — 

observations. 

In another approach the parameter values may be converted into 
dimensionless numbers. The parameter value is converted into a dimensionless 
number by determining a dimensionless score for each parameter resulting in a 
distribution of scores having a mean value of zero and a standard deviation of 
one. The dimensionless score is a number that is used to rank some object (such 
as an oligonucleotide) to which that score relates. A score that has no units (i.e., 
a pure number) is called a dimensionless score. 

In one approach the following equations are used for converting the values 
of said parameters into dimensionless numbers: 



where s/.x is the dimensionless score derived from parameter x calculated for 
oligonucleotide /. x, is the value of parameter x calculated for oligonucleotide /, <x> 
is the average of parameter x calculated for all of the oligonucleotides under 
consideration for a given nucleotide sequence target, and <j{x} is the standard 
deviation of parameter x calculated for all of the oligonucleotides under 
consideration for a given nucleotide sequence target, and is given by the equation 
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where M is the number of oligonucleotides. The resulting distribution of scores, 
{s} has a mean value of zero and a standard deviation of one. These properties 
can be important for a combination of the scores discussed below. 

The use of a dimensionless number approach may further include 
calculating a combination score S/by evaluating a weighted average of the 
;«r^;w;wi \#oiii£^o of tho Him^^nQinnlfacis ?=iCOr*=^!=5 S- - hv thft fiQuation: 



where Qx is the weight assigned to the score derived from parameter x, the 
individual values of Qx are always greater than zero, and the sum of the weights Qx 
is unity. 

In another variation of the above approach, the method of calculation of the 
composite parameter is optimized based on the correlation of the individual 
composite scores to real data, as explained more fully below. 

In one approach the calculation of the composite score further involves 
determining a moving window-averaged combination score <S/> for the ith probe 
by the equation; 



where w is the length of the window for averaging (i.e., w nucleotides long), and 
then applying a cutoff filter to the value of <S/>. This procedure results in 
smoothing (smoothing procedure) by turning each score into a consensus metric 
for a set of w adjacent oligonucleotide probes. The score, referred to as the 
"smoothed score," is essentially continuous rather than a few discrete values. The 




w 
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value of the smoothed score is strongly influenced by clustering of scores with 
high or low values; window averaging therefore provides a measurement of 
cluster size. 

An advantage of the dimensionless score approach to the probe prediction 
algorithm is that it is easy to objectively optimize. In one approach to training the 
algorithm, optimization of the weights qx above may be performed by varying the 
values of the weights so that the correlation coefficient p{<si>}.m between the set of 
window-averaged combination scores {<Si>} and a set of calibration experimental 
measurements {Vi} is maximized. The correlation coefficient p{<si>}.m 's calculated 



-tn 4rr\rY\ +ho oni isjf inn 



J 1 Covariance((5),>^) 

where M is the number of window averaged, combination dimensionless scores 
1 5 and the number of corresponding measurements, the covariance is as defined 
earlier (see earlier equations) and cr{<si>} and cjfvi} are the standard deviations of 
{<Si>} and {V}, as defined previously. An example of this approach is shown in 
Example 2, below. 

In another approach the parameter is derived from one or more factors by 
20 mathematical transformation of the factors. This involves the calculation of a new 
predictive parameter from one or more existing predictive parameters, by means 
of an equation. For instance, the equilibrium constant Kopen for formation of an 
oligonucleotide with no intramolecular structure from its structured fomi can be 
calculated from the intramolecular structure free energy AGmfold, using the 
25 equation: 



In a next step of the method oligonucleotides in the subset are then 
30 identified that are clustered along a region of the nucleotide sequence that is 
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hybridizable to the target nucleotide sequence. For example, consider a set of 
overlapping oligonucleotides identified by dividing a nucleotide sequence into 
subsequences. A subset of the oligonucleotides is obtained as described above. 
In general, this subset is obtained by applying a rule that rejects some members 
of the set. For the remaining members of the set, namely, the subset, there will 
be some average number of nucleotides in the nucleotide sequence between the 
first nucleotides of adjacent remaining subsequences. If, for some sub-region of 
the nucleotide sequence, the average number of nucleotides in the nucleotide 
sequence between the first nucleotides of adjacent remaining subsequences is 
laoo tKon tho n\/<arane for the entire nucleotide sequence, then the 
oligonucleotides are clustered. The smaller the average number of nucleotides 
between the first nucleotides of adjacent oligonucleotides, the stronger the 
clustering. The strongest clustering occurs when there are no intervening 
nucleotides between adjacent starting nucleotides. In this case, the 
oligonucleotides are said to be contiguous and may be referred to as contiguous 
sequence elements or "contigs." 

Accordingly, in this step oligonucleotides are sorted based on length of 
contiguous sequence elements. Oligonucleotides in the subset determined above 
are identified that are contiguous along a region of the input nucleic acid 
sequence. The length of each contig that is equal to the number of 
oligonucleotides in each contig, namely, oligonucleotides from the above step 

whose complement begin at positions m+1, m+2 m+k in the target sequence. 

form a contig of length k. Contigs can be identified and contig length can be 
calculated using, for example, a Visual Basic ® module that can be incorporated 
into a Microsoft ® Excel workbook. 

Cluster size can be defined in several ways: 
For contiguous clusters, the size is simply the number of adjacent oligonucleotides 
in the cluster. Again, this may also be referred to as contiguous sequence 
elements. The number may also be referred to as "contig length". For example, 
consider the nucleotide sequence discussed above, namely, 
ATGGACTTAGCATTCG (SEQ ID NO:3) and the identified set of overlapping 
oligonucleotides 
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rhodamine and other dyes, tetracycline and other protein binding molecules, and 
haptens, etc. The small organic molecule can provide a means for attachment of 
a nucleotide sequence to a label or to a support. 

Support or surface a porous or non-porous water insoluble material. The 
5 surface can have any one of a number of shapes, such as strip, plate, disk, rod, 
particle, including bead, and the like. The support can be hydrophilic or capable 
of being rendered hydrophilic and includes inorganic powders such as glass, 
silica, magnesium sulfate, and alumina; natural polymeric materials, particularly 
cellulosic materials and materials derived from cellulose, such as fiber containing 

10 papers, e.g., filter paper, chromatographic paper, etc.; synthetic or modified 

naturally occurring polymers, such as nitrocellulose, cellulose acetate, poly (vinyl 
chloride), polyacrylamide, cross linked dextran, agarose, polyacrylate, 
polyethylene, polypropylene, poly(4-methylbutene), polystyrene, 
polymethacrylate, poly(ethylene terephthalate), nylon, poly(vinyl butyrate), etc.; 

15 either used by themselves or in conjunction with other materials; glass available 
as Bioglass, ceramics, metals, and the like. Natural or synthetic assemblies such 
as liposomes, phospholipid vesicles, and cells can also be employed. 

Binding of oligonucleotides to a support or surface may be accomplished 
by well-known techniques, commonly available in the literature. See, for example, 

20 A. C. Pease, et a/., Proc. Nat. Acad. Sci. USA , 91 : 5022-5026 (1994). 

Label ~ a member of a signal producing system. Usually the label is part 
of a target nucleotide sequence or an oligonucleotide probe, either being 
conjugated thereto or otherwise bound thereto or associated therewith. The label 
is capable of being detected directly or indirectly. Labels include (1) reporter 

25 molecules that can be detected directly by virtue of generating a signal, (ii) 
specific binding pair members that may be detected indirectly by subsequent 
binding to a cognate that contains a reporter molecule, (iii) oligonucleotide primers 
that can provide a template for amplification or ligation or (iv) a specific 
polynucleotide sequence or recognition sequence that can act as a ligand such as 

30 for a repressor protein, wherein in the latter two instances the oligonucleotide 
primer or repressor protein will have, or be capable of having, a reporter 
molecule. In general, any reporter molecule that is detectable can be used. 
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The reporter molecule can be isotopic or nonisotopic, usually non-isotopic, 
and can be a catalyst, such as an enzyme, a polynucleotide coding for a catalyst, 
promoter, dye, fluorescent molecule, chemiluminescent molecule, coenzyme, 
enzyme substrate, radioactive group, a small organic molecule, amplifiable 
polynucleotide sequence, a particle such as latex or carbon particle, metal sol, 
crystallite, liposome, cell, etc.. which may or may not be further labeled with a 
dye. catalyst or other detectable group, and the like. The reporter molecule can 
be a fluorescent group such as fluorescein, a chemiluminescent group such as 
luminol, a terbium chelator such as N-(hydroxyethyl) ethylenediaminetriacetic acid 
that is capable of detection by delayed fluorescence, and the like. 

The label is a member of a signal producing system and can generate a 
detectable signal either alone or together with other members of the signal 
producing system. As mentioned above, a reporter molecule can be bound 
directly to a nucleotide sequence or can become bound thereto by being bound to 
an sbp member complementary to an sbp member that is bound to a nucleotide 
sequence. Examples of particular labels or reporter molecules and their detection 
can be found in U.S. Patent No. 5,508,178 issued April 16, 1996, at column 11 , 
line 66, to column 14, line 33, the relevant disclosure of which is incorporated 
herein by reference. When a reporter molecule is not conjugated to a nucleotide 
sequence, the reporter molecule may be bound to an sbp member 
complementary to an sbp member that is bound to or part of a nucleotide 
sequence. 

Signal Producing System -- the signal producing system may have one or 
more components, at least one component being the label. The signal producing 
system generates a signal that relates to the presence or amount of a target 
polynucleotide in a medium. The signal producing system includes all of the 
reagents required to produce a measurable signal. Other components of the 
signal producing system may be included in a developer solution and can include 
substrates, enhancers, activators, chemiluminescent compounds, cofactors, 
inhibitors, scavengers, metal ions, specific binding substances required for 
binding of signal generating substances, and the like. Other components of the 
signal producing system may be coenzymes, substances that react with enzymic 
products, other enzymes and catalysts, and the like. The signal producing system 
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provides a signal detectable by external means, by use of electromagnetic 
radiation, desirably by visual examination. Signal-producing systems that may be 
employed in the present invention are those described more fully in U.S. Patent 
No. 5,508,178, the relevant disclosure of which is incorporated herein by 
reference. 

Ancillary Materials ~ Various ancillary materials will frequently be employed 
in the methods and assays utilizing oligonucleotide probes designed in 
accordance with the present invention. For example, buffers and salts will 
normally be present in an assay medium, as well as stabilizers for the assay 
medium and the assay components. Frequently, in addition to these additives, 
proteins may be included, such as albumins, organic solvents such as formamide, 
quaternary ammonium salts, polycations such as spermine, surfactants, 
particularly non-ionic surfactants, binding enhancers, e.g.. polyalkylene glycols, or 
the like. 

DETAILED DESCRIPTION OF THE INVENTION 
The invention is directed to methods or algorithms for predicting 
oligonucleotides specific for a nucleic acid target where the oligonucleotides 
exhibit a high potential for hybridization. The algorithm uses parameters of the 
oligonucleotide and the oligonucleotide/target nucleotide sequence duplex, which 
can be readily predicted from the primary sequences of the target polynucleotide 
and candidate oligonucleotides. In the methods of the present invention, 
oligonucleotides are filtered based on one or more of these parameters, then 
further filtered based on the sizes of clusters of oligonucleotides along the input 
polynucleotide sequence. The methods or algorithms of the present Invention 
may be carried out using either relatively simple user-written subroutines or 
publicly available stand-alone software applications (e.g.. dynamic programming 
algorithm for calculating self-structure free energies of oligonucleotides). The 
parameter calculations may be orchestrated and the filtering algorithms may be 
implemented using any of a number of commercially available computer programs 
as a framework such as, e.g., Microsoft® Excel spreadsheet, Microsoft® Access 
relational database and the like. The basic steps involved in the present methods 
involve parsing a sequence that is complementary to a target nucleotide sequence 
into a set of overlapping oligonucleotide sequences, evaluating one or more 
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parameters for each of the oligonucleotide sequences, said parameter or 
parameters being predictive of probe hybridization to the target nucleotide 
sequence, filtering the oligonucleotide sequences based on the values for each 
parameter, filtering the oligonucleotide sequences based on the length of 
5 contiguous sequence elements and ranking the contiguous sequence elements 
based on their length. We have found that oligonucleotides in the longest 
contiguous sequence elements generally show the highest hybridization 
efficiencies. 



10 showing high hybridization efficiencies tend to form clusters. It is believed that 
this clustering reflects local regions of the target nucleotide sequence that are 
unstructured and accessible for oligonucleotide binding. Oligonucleotides that are 
contiguous along a region of the input nucleic acid sequence are identified. These 
oligonucleotides are sorted based on the length of the contiguous sequence 

15 elements. The sorting approach used in the present invention apparently serves 
as a surrogate for the calculation of local secondary structure of the target 
nucleotide sequence. This is supported by our observation that treatments 
intended to eliminate long-range nucleic acid structure (e.g., random 
fragmentation) do not eliminate the differences in hybridization yields across 

20 oligonucleotide probe arrays. This implies that major determinants of efficient 
hybridization are local regions of the target sequence. The identification of 
contiguous sequence elements is a simple and efficient method for recognizing 
clusters of such determinants and, thus, for identifying oligonucleotide probes that 
exhibit high hybridization efficiency for a target nucleotide sequence. 

25 As mentioned above one embodiment of the present invention is a method 

for predicting the potential of an oligonucleotide to hybridize to a target nucleotide 
sequence. A predetermined number of unique oligonucleotides is identified. The 
length of the oligonucleotides may be the same or different. The oligonucleotides 
are unique in that no two of the oligonucleotides are identical. The unique 

30 oligonucleotides are chosen to sample the entire length of a nucleotide sequence 
that is hybridizable with the target nucleotide sequence. The actual number of 
oligonucleotides is generally determined by the length of the nucleotide sequence 
and the desired result. The number of oligonucleotides should be sufficient to 



The present methods are based on our recognition that oligonucleotides 
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ATGGACTTAGCA (SEQ ID NO:4) 
TGGACTTAGCAT (SEQ ID NO:5) 
GGACTTAGCATT (SEQ ID NO:6) 
GACTTAGCATTC (SEQ ID NO:7) 
5 ACTTAGCATTCG (SEQ ID NO:8) 

Suppose that, after calculation and evaluation of the predictive parameters, four 
nucleotides remain: 

ATGGACTTAGCA (SEQ ID NO;4) | 

10 TGGACTTAGCAT (SEQ ID NO:5) I contig 

GGACTTAGCATT (SEQ ID NO:6) | 

ACTTAGCATTCG (SEQ ID NO:8) I single oligonucleotide 

15 A "contig" encompassing three of the oligonucleotides of the subset is present 
together with a single oligonucleotide. The contig length is 3 oligonucleotides. 

Alternatively, cluster size at some position in the sequence hybridizable or 
complementary to the target sequence may be defined as the number of 
^ oligonucleotides whose center nucleotides fall inside a region of length M 

J 20 centered about the position in question, divided by M. This definition of clustenng 
y allows small gaps in clusters. In the example used above for contiguous clusters, 

M if M was 10, then the cluster size would step through the values 0/10,..., 0/10, 

HJ 1/10. 2/10, 3/10, 3/10, 4/10, 4/10, 4/10, 4/10, 4/10, 3/10, 2/10, 1/10, 1/10, 0/10 as 

1=* the center of the window of length 1 0 passed through the cluster. In each fraction. 

25 the numerator is the number of oligonucleotide sequences that have satisfied the 
filter set and whose central nucleotides are within a window 10 nucleotides long, 
centered about the nucleotide under consideration. The denominator (10) is 
simply the window length. 

Another alternative is to define the size of a cluster at some position in the 
30 sequence hybridizable or complementary to the target sequence as the number of 
oligonucleotide sequences overlapping that position. This definition is equivalent 
to the last definition with M set equal to the oligonucleotide probe length and 
omission of the division by M. 

Finally, cluster size can be approximated at each position in a nucleotide 
35 sequence by dividing the sequence into oligonucleotides, evaluating a numerical 
score for each oligonucleotide, and then averaging the scores in the neighborhood 
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of each position by means of a moving window average as described above. 
Window averaging has the effect of reinforcing clusters of high or low values 
around a particular position, while canceling varying values about that position. 
The window average, therefore, provides a score that is sensitive to both the 
5 hybridization potential of a given oligonucleotide and the hybridization potentials of 
its neighbors. 

In a next step of the present method, the oligonucleotides in the subset are 
ranked. Generally, this ranking is based on the lengths of the clusters or contigs, 
sizes of the clusters or values of a window averaged score. Oligonucleotides 

10 found in the longest contigs or largest clusters, or possessing the highest window 
averaged scores usually show the highest hybridization efficiencies. Often, the 
highest signal intensity within the cluster corresponds to the median 
oligonucleotide of the cluster. However, the peak signal intensity within the contig 
can be determined experimentally, by sampling the cluster at its first quartile, 

15 midpoint and third quartile, measuring the hybridization efficiencies of the sampled 
oligonucleotides, interpolating or extrapolating the results, predicting the position 
of the optimal probe, and then iterating the probe design process. 

Fig. 1 shows a diagram of an example of the above-described method by 
way of illustration and not limitation. Referring to Fig. 1 a target sequence of 

20 length L from, e.g., a database, is used to generate a sequence that is 

hybridizable to the target sequence from which candidate oligonucleotide probe 
sequences are generated. One or more parameters are calculated for each of the 
oligonucleotide probe sequences. The candidate oligonucleotide probe 
sequences are filtered based on the values of the parameters. Clustering of the 

25 filtered candidate probe sequences is evaluated and the clusters are ranked by 
size. Then, the oligonucleotide probes are statistically sampled and synthesized. 
Further evaluation may be made by evaluating the hybridization of the selected 
oligonucleotide probes in real hybridization experiments. The above process may 
be reiterated to further define the selection. In this way only a small fraction of the 

30 potential oligonucleotide probe candidates are synthesized and tested. This is in 
sharp contrast to the known method of synthesizing and testing all or a major 
portion of potential oligonucleotide probes for a given target sequence. 
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The methods of the present invention are preferably carried out at least in 
part with the aid of a computer. For example, an IBM® compatible personal 
computer (PC) may be utilized. The computer is driven by software specific to the 
methods described herein. 



the methods in accordance with the present invention involves a system with at 
least the following specifications: Pentium® processor or better with a clock speed 
of at least 100 MHz, at least 32 megabytes of random access memory (RAM) and 
at least 80 megabytes of virtual memory, running under either the Windows 95 or 

10 Windows NT 4.0 operating system (or successor thereof). 

As mentioned above, software that may be used to carry out the methods 
may be either Microsoft Excel or Microsoft Access, suitably extended via user- 
written functions and templates, and linked when necessary to stand-alone 
programs that calculate specific parameters (e.g.. MFOLD for intramolecular 

15 thermodynamic parameters). Examples of software programs used in assisting in 
conducting the present methods may be written, preferably, in Visual BASIC. 
FORTRAN and C++, as exemplified below in the Examples. It should be 
understood that the above computer information and the software used herein are 
by way of example and not limitation. The present methods may be adapted to 

20 other computers and software. Other languages that may be used include, for 
example, PASCAL, PERL or assembly language. 

Fig. 2 depicts a more specific approach to a method in accordance with the 
present invention. Referring to Fig. 2. a sequence of length L is obtained from a 
database such as GenBank. UniGene or a proprietary sequence database. Probe 

25 length N is determined by the user based on the requirements for sensitivity and 
specificity and the limitations of the oligonucleotide synthetic scheme employed. 
The probe length and sequence length are used to generate L-N+1 candidate 
oligonucleotide probes, i.e., from every possible starting position. An initial 
selection is made based on local sequence predicted thermodynamic properties. 

30 To this end, melting temperature Tm and the self-structure free energy AGmfold. 
are calculated for each of the potential oligonucleotide probe: target nucleotide 
sequence complexes. Next, M probes that satisfy Tm and AGmfold filters are 
selected. A further selection can be made based on clustering of "good" 



5 



The preferred computer hardware capable of assisting in the operation of 



Attorney Docket No. 10971464-1 



parameters. Good parameters are parameters that satisfy all of the filters In the 
filter set. Clustering is defined by any of the methods described previously; in Fig. 
2, the "contig length" definition of clustering is used. 

For each of the IVI oligonucleotide sequences that satisfied all filters the 
question is asked whether the oligonucleotide sequence immediately following the 
sequence under consideration is also one of the sequences that satisfied all of the 
filters. If the answer to this question is NO, then one stores the current value of 
the contig length counter, resets the counter to zero and proceeds to the next 
oligonucleotide sequence that satisfied all filters. If the answer to the question is 
YES, then 1 is added to the contig length counter and, if the counter now equals 1 
(i.e., this is the first oligonucleotide probe sequence in the contig). the starting 
position of the oligonucleotide is stored. One then moves to the next 
oligonucleotide that satisfied all filters, which, in this case, is the same as the next 
oligonucleotide before the application of the filter set. The process is repeated 
until all M filtered oligonucleotide sequences have been examined. In this way, a 
single pass through the set of M filtered oligonucleotide sequences generates the 
lengths and starting positions of all contigs. 

Next, contigs are ranked based on the lengths of their contiguous 
sequence elements. Longer contig lengths generally correlate with higher 
hybridization efficiencies. All oligonucleotides of the higher-ranking contigs may 
be considered, or candidate oligonucleotide probes may be picked. For example, 
candidate oligonucleotide probes can be picked one quarter, one half and three 
quarters of the way through each contig. The latter approach provides local 
curvature determination after experimental determination of hybridization 
efficiencies, which allows either interpolation or extrapolation of the positions of 
the next probes to be synthesized in order to close in on the optimal probe in the 
region. If the contig brackets the actual peak of hybridization efficiency, the 
process will converge in 2-3 iterations. If the contig lies to one side of the actual 
peak, the process will converge in 3-4 iterations. 

The above illustrative approach is further described with reference to the 
following DNA nucleotide sequence, which is the complement of the target RNA 
nucleotide sequence: 
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GTCCAAAAAGGGTCAGTCTACCTCCCGCCATAAAAAACTCATGTTCAAGA 
(SEQ ID NO:9). 

In the first step of the method, the nucleotide sequence is divided into overlapping 
oligonucleotides that are 25 nucleotides in length. This length is chosen because 
it is an effective compromise between the need for sensitivity (enhanced by longer 
oligonucleotides) and the chemosynthetic efficiency of schemes for synthesis of 
surface-bound arrays of oligonucleotide probes. 

Next, the estimated duplex melting temperatures (Tm) and self-structure 
free energies (AGmfold) are calculated for each oligonucleotide In the set of 
overlapping oligonucleotides. The values are obtained from a user-written 
function that calculates DNA/RNA heteroduplex themiodynamic parameters (see 
N. Sugimoto, et ai. Biochemistry , 34:1 121 1 (1995)) and a modified version of the 
program MFOLD that estimates the free energy of the most stable intramolecular 



Attorney Docket No. 10971464-1 



-46- 



Structure of a single stranded DNA molecule (see J.A. Jaeger, ef a/.. (1989), 
supra, respectively. The steps are illustrated below. 



GTCCAAAAAGGGTCff^GTCTACCTCCCGCCATAAAAAACTCATGTTCAAGA 



(target complement sequence) 



GTCCAAAAAGGGTCAGTOTACCTCC 
TCCAAAAAGGGTCAGTC^CCTCCC 
CCAAAAAGGGTCAGTCT^CTCCCG 

caaaaagggtcagtctaVctcccgc 
aaaaagggtcagtctacVtcccgcc 
aaaagggtcagtctacctcccgcca 
aaagggtcagtctacctcccgccat 
aagggtcagtctacctoccgccata 
agggtcagtctacctcocgccataa 
gggtcagtctacctcccgccataaa 
ggtcagtctacctcccI^ 

GTCAGTCTACCTCCCdtCATAAAAA 

tcagtctacctcccgccataaaaaa 
cagtctacctcccgcqataaaaaac 

AGTCTACCTCCCGCCATAAAAAACT 

gtctacctcccgcca'^ 
tctacctcccgccai'^aaaaactca 
ctacctcccgccatmaaaactcat 
tacctcccgccataa^aaactcatg 
acctcccgccataaaaaactcatgt 

CCTCCCGCCATAAAMACTCATGTT 

ctcccgccataaaaXactcatgttc 
tcccgccat; 
cccgccataaaaaa^tcatgttcaa 

CCATGTTCAAG 

cgccataaaaaact\atgttcaaga 



T 
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AG 


MFOLD 
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20 
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i 
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20 


SEQ ID NO. 15 
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~ X . 
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X . 
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X . 
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otzQ ID NO. 10 


/ D 
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X . 


20 


SEQIDNO:19 


73 


. 10 


-0. 


80 


SEQ ID NO:20 


69 


.50 


0. 


90 


SEQ ID NO:21 


65 


. 60 


0. 


90 


SEQ ID NO:22 


64 


. 96 


0. 


90 


SEQ ID NO:23 


65 




1. 


10 


SEQ ID N0:24 


66 


.36 


2. 


40 


SEQ ID NO:25 


64 


.97 


2. 


90 


SEQ ID NO:26 


63 


.96 


2. 


70 


SEQ ID NO:27 


62 


.58 


1. 


10 


SEQ ID N0:28 


65 


.10 


0. 


40 


SEQ ID NO:29 


64 


.96 


0. 


10 


SEQ ID NO:30 


63 


.37 


-0. 


10 


SEQ ID NO:31 


62 


.86 


-0. 


10 


SEQ ID NO:32 


60 


.47 


-0. 


10 


SEQ ID NO:33 


57 


.98 


-0. 


10 


SEQ ID NO:34 


56 


.20 


-0. 
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SEQ ID N0:35 
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Next, the oligonucleotide sequences are filtered on the basis of Tm. A high 
and low cut-off value nnay be selected, for example, 60°C < r„, < 85^C . Thus, 
oligonucleotides having Tm values falling within the above range are retained. 
Those outside the range are discarded, which is indicated below by lining out of 
those oligonucleotides and parameter values. 

GTCCAAAAAGGGTCAGTCTACCTCCCGCCATAAAAAACTCATGTTCAAGA (target Complement sequence) 





Tm 




AG 


MFOLD 


GTCCAAAAAGGGTCAGTCTACCTCC 


71 


.77 


-1. 


20 


TCCAAAAAGGGTCAGTCTACCTCCC 


71 


. 99 


-1. 


20 


CCAAAAAGGGTCAGTCTACCTCCCG 


70 


.78 


-1. 


20 


CAAAAAGGGTCAGTCTACCTCCCGC 


71 


.23 


-1. 


20 


AAAAAGGGTCAGTCTACCTCCCGCC 


73 


.07 


-1. 


20 


AAAAGGGTCAGTCTACCTCCCGCCA 


75 


. 68 


-1. 


20 


AAAGGGTCAGTCTACCTCCCGCCAT 


77 


.53 


-1. 


20 


AAGGGTCAGTCTACCTCCCGCCATA 


79 


.03 


-1. 


20 


AGGGTCAGTCTACCTCCCGCCATAA 


79 


.03 


-1. 


20 


GGGTCAGTCTACCTCCCGCCATAAA 


76 


.85 


-1. 


20 


GGTCAGTCTACCTCCCGCCATAAAA 


73 


.10 


-0. 


80 


GTCAGTCTACCTCCCGCCATAAAAA 


69 


.50 


0. 


90 


TCAGTCTACCTCCCGCCATAAAAAA 


65 


. 60 


0. 


90 


CAGTCTACCTCCCGCCATAAAAAAC 


64 


. 96 


0. 


90 


AGTCTACCTCCCGCCATAAAAAACT 


65 


.48 


1. 


10 


GTCTACCTCCCGCCATAAAAAACTC 


66 


.36 


2. 


40 


TCTACCTCCCGCCATAAAAAACTCA 


64 


. 97 


2. 


90 


CTACCTCCCGCCATAAAAAACTCAT 


63 


. 96 


2. 


70 


TACCTCCCGCCATAAAAAACTCATG 


62 


.58 


1. 


10 


ACCTCCCGCCATAAAAAACTCATGT 


65 


.10 


0. 


40 


CCTCCCGCCATAAAAAACTCATGTT 


64 


. 96 


0. 


10 


CTCCCGCCATAAAAAACTCATGTTC 


63 


.37 


-0. 


10 


TCCCGCCATAAAAAACTCATGTTCA 


62 


.86 


-0. 


10 


CCCGCCATAAAAAACTCATGTTCAA 


60 


.47 


-0. 


10 


CCGCCATAAAAAACTCATGTTCAAC 






-0. 


10 


CGCCATAAAAAACTCATCTTCAACA 
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10 
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Next, the oligonucleotide sequences remaining after the above exercise are 
filtered on the basis of AGmfold and are retained if the value is greater than - 0.4, 
Those oligonucleotides with a AGmfold less than - 0.4 are discarded, which is 
indicated below by double lining out of those oligonucleotides and parameter 
values. 

GTCCAAAAAGGGTCAGTCTACCTCCCGCCATAAAAAACTCATGTTCAAGA (target Complement sequence) 



Tm CO) AGmfold 



71.77 

71.99 - 1 1 ao 

70.78 

71.23 -aPT^ 

73.07 - liQO 

75.68 -i^P^ 

77.53 -.3^^^ 

7 9.03 -^PT^ 

7 9.03 -3-.^ 

76.85 -t^^ 
73.10 -(d^F^ 
69.50 0.90 
65.60 0.90 

64.96 0.90 
65.48 1.10 

66.36 2.40 

64.97 2.90 
63.96 2.70 
62.58 1.10 
65.10 0.40 
64.96 0.10 

63.37 -0.10 

62.86 -0.10 
60.47 -0.10 
57. 9 8 -0.10 
56. 20 -0.10 




CCMiJ^uT^iACGGTGACTCTACGTCCGG 
CxT^jJ^j^iiT^iAGCGTCAGTCTACCTCCCCG 
/^J^J^J^uniCCCTCACTCmcCTCGGGCC 
J^uT^u-nHT^^CCCTCAGTGTACCTCGCCCCA 
iT^J^uT^iGGGTCAGTCTinCCTCCCGCCArp 
J^uniGCCTCACTCTACCTCCCGCGATA 
iniGCGTCACTCTAGGTCCCGGGATim 
^ CGCTCAGTCTACCTCCCGGCATJ^um 
/ CGTCACTCTAGGTCCCGCCATiT^.uPJlu7v 
GTCAGTCTACCTCCCGCCATAAAAA 
TCAGTCTACCTCCCGCCATAAAAAA 
CAGTCTACCTCCCGCCATAAAAAAC 
AGTCTACCTCCCGCCATAAAAAACT 
GTCTACCTCCCGCCATAAAAAACTC 
TCTACCTCCCGCCATAAAAAACTCA 
CTACCTCCCGCCATAAAAAACTCAT 
TACCTCCCGCCATAAAAAACTCATG 
ACCTCCCGCCATAAAAAACTCATGT 
CCTCCCGCCATAAAAAACTCATGTT 
CTCCCGCCATAAAAAACTCATGTTC 



TCCCGCCATAAAAAACTCATGTTCA 



CCCGCCATAAAAAACTCATGTTCAA 
C CC CCAT AAt^JiAACT C AT CTT CAAG 
CCCCATTy^iAAAACTCATGTTCAAGA 
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Clusters of retained oligonucleotides are identified and ranked based on 
cluster size. In this example, a contiguous cluster of 13 retained oligonucleotides 
is identified by the vertical black bar on the left. Any or all of the oligonucleotides 
in this cluster may be evaluated experimentally. 

GTCCAAAAAGGGTCAGTCTACCTCCCGCCATAAAAAACTCATGTTCAAGA (target Complement Sequence) 



10 



20 



25 



30 



35 



CT QQmil ^ J^^iQ C G T G AGT CT JVCGT CC 71.77 

TCCJ^J^J'JViT^iCCGTGMTCTMGTCCC 71.99 

CCJ>ii^uTiuTiJ^iCCCTCACTCTOCGTGCGG 70.78 

CJ^iJ'J'J^ii^CGGTGAGTCTOCGTCGGGC 71.23 

" a^uPJ'yriG C C T C ACT C TxMiC CrrCCCGCC 73,07 

iTijiuniiT^iCCCTGiniGTGTOGCTCCCCCCA 75.68 

^ J i^-uAAGCCTGAGTCTAGGTCGCCGCM 77.53 

MGCGTCAGTGTAGCTGCCGCCATA 79.03 

ACGCTGAGTGTAGCTGCGGGCATm 79.03 

GCGTCAGTGTAGGTCCGGGGATAiT^uTix 76.85 

CGTCAGTGTAGGTGGGCGGATiT^uT^iAinj 73.10 

GTCAGTCTACCTCCCGCCATAAAAA 69.50 

TCAGTCTACCTCCCGCCATAAAAAA 65 . 60 

CAGTCTACCTCCCGCCATAAAAAAC 64.96 

AGTCTACCTCCCGCCATAAAAAACT 65.48 

GTCTACCTCCCGCCATAAAAAACTC 66.36 

TCTACCTCCCGCCATAAAAAACTCA 64.97 

CTACCTCCCGCCATAAAAAACTCAT 63.96 

TACCTCCCGCCATAAAAAACTCATG 62 . 58 

ACCTCCCGCCATAAAAAACTCATGT 65 . 10 

CCTCCCGCCATAAAAAACTCATGTT 64 . 96 

CTCCCGCCATAAAAAACTCATGTTC 63 . 37 

TCCCGCCATAAAAAACTCATGTTCA 62.86 

CCCGCCATAAAAAACTCATGTTCAA 60.47 

CCGCCATA/^iAAAACTCATGTTCAAC 57 . 9 8 

CGCCATAA7VAAACTCATGTTCAACA 56.20 



AGmfold 
— ^^i^^ 

— ^^^^ 

X . 20 

— 4-^^^ 
— ^^^^ 

0.90 
.90 
.90 
.10 
.40 
.90 
.70 
.10 
.40 



0. 
0. 

1. 

2. 
2, 
2. 
1. 
0. 



0.10 
-0. 10 
-0.10 
-0.10 
-0. 10 
-0. 10 
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Alternatively, in orte approach the oligonucleotides at the first quartile, the 
median and the third quartile of the cluster may be selected for experimental 
evaluation, indicated belowW bold print. 

gtccaaaaagggtcagtctacctcccgccVtaaaaaactcatgttcaaga (target complement sequence) 



CT C C J^uOJmmC C G T G ACT CT JiiC CT G C 
T C GJ^iI ^ iF ^ MG GOT C ACT CT AC CT CCC 
C Cli^il^MC G G T C ACT CT AC CT C CCC 
C JiiiTiiJiuT\jixC C C T C ACT CT AG CT C C CC C 
pj\jr\j\j),G C CT G ACT CT AC CTCGCCCG 
jnjiuPJ^xC G C T G AC T CT AC CT G C G C GC A 
Ml jxC C C T G ACT CT AC CT G G G C C CA' 



MCCCTGAGTCTACGTCCGCGCATA 
ACGGTGACTCTAGCTCCCCCCATAA 
CGGTGAGTCTACGTCCCGCCATJiiinj^i 
GGTCACTGTACCTCCCCCGATM.unJU 
GTCAGTCTACCTCCCGCCATAAAAA 
TCAGTCTACCTCCCGCCATAAAAAA 
CAGTCTACCTCCCGCCATAAAAAAC 
AGTCTACCTCCCGCCATAAAAAACT 
GTCTACCTCCCGCCATAAAAAACTC 
TCTACCTCCCGCCATAAAAAACTCA 
CTACCTCCCGCCATAAAAAACTCAT 
TACCTCCCGCCATAAAAAACTCATG 
ACCTCCCGCCATAAAAAACTCATGT 
CCTCCCGCCATAAAAAACTCATGTT 
CTCCCGCCATAAAAAACTCATGTTC ' 
TCCCGCCATAAAAAACTCATGTTCA^ 
CCCGCCATAAAAAACTCATGTTC^ 
CCGCCATJ^iAAAAACTCATCTTCAAi^ 
CCCCAT/WW^JVCTCATCTTCAAC 



T 


\ ^/ 


AGmfold 


7 1 


11 


-1 1 00 


7 1 




— ^^B^^ 


1 0 
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— ^^^^ 
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— ^^^^ 


/ J 
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— ^^^^ 
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. 68 


— ^^^^ 


7 7 
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.03 
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79 


.03 




76 


.85 




73 


.10 


— ^^^^ 


69 


.50 


0.90 


65 


.60 


0. 90 


64 


. 96 


0.90 


65 


.48 


1.10 


66 


.36 


2.40 


64 


. 97 


2.90 


63 


. 96 


2.70 


62 


.58 


1.10 


65 


.10 


0.40 


64 


.96 


0.10 


63 


.37 


-0.10 


62 


.86 


-0.10 


60 


.47 


-0.10 






-0. 10 






-0.10 



In one aspect of the present method, at least two parameters are 
determined wherein the parameters are poorly correlated with respect to one 
another. The reason for requiring that the different parameters chosen are poorly 
correlated with one another is that an additional parameter that is strongly 
correlated to the original parameter brings no additional information to the 
prediction process. The correlation to the original parameter is a strong indication 
that both parameters represent the same physical property of the system. 
Another way of stating this is that correlated parameters are linearly dependent on 
one another, while poorly correlated parameters are linearly independent of one 
another. In practice, the absolute value of the correlation coefficient between any 
two parameters should be less than 0.5, more preferably, less than 0.25, and, 
most preferably, as close to zero as possible. 
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In one preferred approach instead of Tm, for each oligonucleotide/target 
nucleotide sequence duplex, the difference between the predicted duplex melting 
temperature corrected for salt concentration and the temperature of hybridization 
of each of the oligonucleotides with the target nucleotide sequence is determined. 
5 In one aspect the present method comprises determining two parameters 

at least one of the parameters being the association free energy between a 
subsequence within each of the oligonucleotides and its complementary sequence 
on the target nucleotide sequence, or some similar, strongly correlated parameter. 
The object of this approach is to identify a particularly stable subsequence of the 

1 0 oligonucleotide that might be capable of acting as a nucleation site for the 
beginning of the heteroduplex formation between the oligonucleotide and the 
target nucleotide sequence. Such nucleation is believed to be the rate-limiting 
step for process of heteroduplex formation. 

The subsequence within the oligonucleotide is from about 3 to 9 

15 nucleotides in length, usually, 5 to 7 nucleotides in length. The subsequence is at 
least three nucleotides from the terminus of the oligonucleotide. For support- 
bound oligonucleotides the subsequence is at least three nucleotides from the 
free end of the oligonucleotide, i.e., the end that is not attached to the support. 
Generally, this free end is the 5' end of the oligonucleotide. When the 

20 oligonucleotide is attached to a support, the subsequence is at least three 

nucleotides from the end of the oligonucleotide that is bound to the surface of the 
support to which the oligonucleotide is attached. Generally, the 3' end of the 
oligonucleotide is bound to the support. 

7he predictive parameter can be, for example, either melting temperature 

25 or duplex free energy of the subsequence with the target nucleotide sequence. 
The subsequence with the maximum (melting temperature) or minimum (free 
energy) value of one of the above parameters is chosen as the representative 
subsequence for that oligonucleotide probe. For example, if the oligonucleotide is 
20 nucleotides in length and a subsequence of 5 nucleotides is chosen, i.e., a 5- 

30 men, then parameter values are calculated for all 5-mer subsequences of the 
oligonucleotide that do not include the 2 nucleotides at the free end of the 
oligonucleotide. Where 5' is the free end of the oligonucleotide with designated 
nucleotide number 1 . the values are calculated for all 5-mer subsequences with 
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starting nucleotides from position number 3 to position number 16. Thus, in this 
example, parameter values for 14 different subsequences are calculated. The 
subsequence with the maximum value for the parameter is then assigned as the 
stability subsequence for the oligonucleotide. 
5 The inclusion of the above determination of a stability subsequence results 

in the following algorithm for determining the potential of an oligonucleotide to 
hybridize to a target nucleotide sequence. A predetermined number of unique 
oligonucleotides are identified within a nucleotide sequence that is hybridizable 
with said target nucleotide sequence. The oligonucleotides are chosen to sample 

1 0 the entire length of the nucleotide sequence. For each of the oligonucleotides, 
parameters that are independently predictive of the ability of each of said 
oligonucleotides to hybridize to said target nucleotide sequence are determined 
and evaluated. Two parameters that may be used are the thermodynamic 
parameters of Tm and AGmfold- These parameters give rise to associated 

1 5 parameter filters. In one approach evaluation of the parameters involves 

establishing cut-off values as described above. Application of these cut-off values 
results in the identification of a subset of oligonucleotides for further scrutiny under 
the algorithm. In accordance with this embodiment of the present invention, there 
is included a stability subsequence limit in addition to the above. Cutoff values 

20 are determined either by means of objective optimization algorithms well known to 
the art or via graphical estimation methods; both approaches have been described 
previously in this document. In either case, the optimization of cutoff values 
involves comparison of predictions to known hybridization efficiency data sets. 
This process results in objective optimization as it looks at prediction versus 

25 experimental results and is otherwise referred to herein as "training the 

algorithm." The experimental data used to train the algorithm is referred to herein 
as "training data." 

In the present approach filters are assigned to the Tm oligonucleotide probe 
data. The Tm of each oligonucleotide probe needs to be greater than or equal to 
30 the assigned filter (Tm probe limit) to be given a filter score of "1"; othenA/ise, the 
filter score is "0". In addition, one can also impose a second filter for this 
parameter; that is, that the Tm of the oligonucleotide probe also has to be less 
than a defined upper limit. Filters are also assigned to the AGmfold data. The 
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AGmfold of each oligonucleotide probe should be greater than or equal to the 
assigned filter (AGmfold limit) to be given a filter score of "1"; otherwise, the filter 
score is "0". The filter scores are added. Furthermore, one can also impose a 
second filter for this parameter; that is, that the AGmfold also has to be less than a 
5 defined upper limit. In accordance with the above discussion stability 

subsequences are identified. This leads to another filter. Accordingly, filters are 
assigned to the stability sequence data. The stability subsequence of each 
oligonucleotide probe needs to be greater than or equal to the assigned filter limit 
to be given a filter score of "1"; otherwise, the filter score is "0". In addition, one 

10 can also impose a second filter for this parameter; that is, that the stability 

subsequence also has to be less than a defined upper limit. In all cases, the filter 
values are determined by objective optimization (algorithmic or graphical) of the 
predictions of the present method versus training data, as described previously. 
On the basis of the above filter sets a subset of oligonucleotides within said 

15 predetermined number of unique oligonucleotides is identified. Oligonucleotides 
in the subset are identified that are clustered along a region of the nucleotide 
sequence that is hybridizable to the target nucleotide sequence. The resulting 
number of oligonucleotide probe regions is examined. The above filters may then 
be loosened or tightened by changing the filter limits to obtain more or fewer 

20 clusters of oligonucleotides to match the goal, which is set by the needs of the 
investigator. For instance, a particular application might require that the 
investigator design 5 non-overlapping probes that efficiently hybridize to a given 
target sequence. 

As mentioned above, the contigs may be selected on the basis of contig 
25 length. In another approach, the scores defined above may be summed for 

cluster size determination. To this end the probe score of the particular filter set 
(e.g., Tm probe limit, AGmfold limit and stability sequence limit) is calculated for 
each oligonucleotide probe. The probe score is the sum of the filter scores. Thus, 
the probe score is 0 if no parameters pass their respective filters. The probe score 
30 is 1 , 2 or 3 if one, two or three parameters, respectively, pass their filters for that 
oligonucleotide probe. This summing is continued for each parameter that is in the 
current filter set of the algorithm used. For a given algorithm a minimum probe 
score limit is set. In the current example this limit will be at least 1 and could be 2 
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«r # 

or 3 depending on the needs of the investigator, the number of probe clusters 
required and the results of objective optimizations of algorithm performance 
against training data. The probe score is compared to this probe score limit. If the 
probe score of oligonucleotide probe / is greater than or equal to the probe score 
5 limit, then oligonucleotide probe / is assigned a score passed value of 1 . Next, a 
window is chosen for the evaluation of clustering (the "cluster window"). This will 
be the next filter applied. The cluster window ('V) smoothes the score passed 
values by summing the values in a window w nucleotides long, centered about 
position /. The resulting sum is called the cluster sum. Usually, the cluster window 
10 is an odd integer, usually 7 or 9 nucleotides. The cluster sum values are then 
filtered, by comparing to a user-set threshold, cluster filter. If cluster sum is 
greater than or equal to cluster filter, this filter is passed, and the probe is 
C= predicted to hybridize efficiently to its target. 

JO This window summing procedure converts the score for the passed value 

% 15 for each oligonucleotide into a consensus metric for a set of w adjacent probes. A 

y 3 

"consensus metric" is a measurement that distills a number of values into one 
'T consensus value. In this case, the consensus value is calculated by simply 

y summing the individual values. The window summing procedure therefore 

\ y 

evaluates a property similar to the contig length metric discussed above. 
Q 20 However, the summed score has the advantage of allowing for a few probes 
H= within a cluster to have not passed their individual probe score limits. We have 

found that this allows more observed hybridization peaks to be predicted. 

It may be desired in some circumstances to combine the results of multiple 
algorithm versions. We refer to this operation as "tiling". This may be explained 
25 more fully as follows. Tiling generally involves joining together the predicted 

oligonucleotide probe sets identified by multiple algorithm versions. In the context 
of the present invention, tiling multiple algorithm versions involves forming the 
union of multiple sets of predictions. These predictions may arise from different 
embodiments of the present invention. Alternatively, the different sets of 
30 predictions may arise from the same embodiment, but different filter sets. The 
different filter sets may additionally be restricted to different combinations of 
parameter values. For instance, one filter set might be used when the predicted 
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duplex melting temperature Tm is greater than or equal to some value, while 
another might be used when Tm is below that value. 

An example of the logical endpoint of tiling multiple filter sets across 
different regions of the possible combinations of predictive parameters and then 
forming the union of the resulting predictions is the contour plot shown in Fig. 3, 
with the associated rule that "the value of the normalized hybridization intensity 
associated with a particular combination of (Tm-Thyb) and AGmfold must be greater 
than or equal to some threshold value." In this case, the contour at the threshold 
value becomes the filter. This contour and its interior can be thought of as the 
union of many small rectangular regions ("tiles"), each of which is bracketed by 
low and high cutoff values for each of the parameters. 

The predictions of different algorithm versions can also be combined by 
forming the intersection of two or more different predictions. The reliability of 
predictions within such intersection sets is enhanced because such sets are, by 
definition, insensitive to changes in the details of the predictive algorithm. 
Intersection is a useful method for reducing the number of predicted probes when 
a single algorithm version produces too many candidate probes for efficient 
experimental evaluation. 

The most specific oligonucleotide probe set (i.e., the set least likely to 
include poor probes) will be the intersection set from multiple algorithms. Clusters 
that have overlapping oligonucleotide probes from multiple algorithms constitute 
the intersection set of oligonucleotide probes. The oligonucleotide probe that is in 
the center of an intersection cluster is chosen. This central oligonucleotide probe 
may have the highest probability of predicting a peak or, in other words, of binding 
well to the target nucleotide sequence. Oligonucleotide probes on either side of 
center, which are still within the intersection cluster, may also be selected. The 
distance of these "side" oligonucleotide probes from the center generally will be 
shorter or longer depending upon the length of the cluster. 

The most sensitive set of oligonucleotide probes (i.e., the set most likely to 
include at least one good probe) is generally the union set from multiple 
algorithms. Clusters that are predicted by at least one type of algorithm constitute 
the union set of oligonucleotide probes. The oligonucleotide probe in the center of 
a union cluster is chosen. Oligonucleotide probes on either side of center, which 
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are still within the union cluster, usually are also chosen. The distance of these 
side probes from the center will be shorter or longer depending upon the length of 
the cluster. In summary, the combination of using the stability subsequence 
parameter, tiling multiple filter sets, and making union and intersection cluster sets 
of oligonucleotide probes exhibits very high sensitivity and specificity in predicting 
oligonucleotide probes that effectively hybridize to a target nucleotide sequence of 
interest. 

Another aspect of the present invention is a computer based method for 
predicting the potential of an oligonucleotide to hybridize to a target nucleotide 
sequence. A predetermined number of unique oligonucleotides within a 
nucleotide sequence that is hybridizable with the target nucleotide sequence is 
identified under computer control. The oligonucleotides are chosen to sample the 
entire length of the nucleotide sequence. A value is determined and evaluated 
under computer control for each of the oligonucleotides for at least one parameter 
that is independently predictive of the ability of each of the oligonucleotides to 
hybridize to the target nucleotide sequence. The parameter values are stored. 
Based on the examination of the stored parameter values, a subset of 
oligonucleotides within the predetermined number of unique oligonucleotides is 
identified under computer control. Then, oligonucleotides in the subset that are 
clustered along a region of the nucleotide sequence that is hybridizable to the 
target nucleotide sequence are identified under computer control. 

A computer program is utilized to carry out the above method steps. The 
computer program provides for input of a target-hybridizable or target- 
complementary nucleotide sequence, efficient algorithms for computation of 
oligonucleotide sequences and their associated predictive parameters, efficient, 
versatile mechanisms for filtering sets of oligonucleotide sequences based on 
parameter values, mechanisms for computation of the size of clusters of 
oligonucleotide sequences that pass multiple filters, and mechanisms for 
outputting the final predictions of the method of the present invention in a 
versatile, machine-readable or human-readable form. 

Another aspect of the present invention is a computer system for 
conducting a method for predicting the potential of an oligonucleotide to hybridize 
to a target nucleotide sequence. An input means for introducing a target 
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nucleotide sequence into the computer system is provided. The input means may 
permit manual input of the target nucleotide sequence. The input means may also 
be a database or a standard format file such as GenBank. Also included in the 
system is means for determining a number of unique oligonucleotide sequences 
5 that are within a nucleotide sequence that is hybridizable with the target 

nucleotide sequence. The oligonucleotide sequences is chosen to sample the 
entire length of the nucleotide sequence. Suitable means is a computer program 
or software, which also provides memory means for storing the oligonucleotide 
sequences. The system also includes means for controlling the computer system 

10 to carry out a determination and evaluation for each of the oligonucleotide 

sequences a value for at least one parameter that is independently predictive of 
the ability of each of the oligonucleotide sequences to hybridize to the target 
nucleotide sequence. Suitable means is a computer program or software such as, 
for example, Microsoft® Excel spreadsheet, Microsoft® Access relational 

15 database or the like, which also provides memory means for storing the parameter 
values. The system further comprises means for controlling the computer to carry 
out an identification of a subset of oligonucleotide sequences within the number of 
unique oligonucleotide sequences based on the automated examination of the 
stored parameter values. Suitable means is a computer program or software, 

20 which also allocates memory means for storing the subset of oligonucleotides. 
The system also includes means for controlling the computer to carry out an 
identification of oligonucleotide sequences in the subset that are clustered along a 
region of the nucleotide sequence that is hybridizable to the target nucleotide 
sequence. Suitable means is a computer program or software, which also 

25 allocates memory means for storing the oligonucleotide sequences in the subset. 
The computer system also includes means for outputting data relating to the 
oligonucleotide sequences in the subset. Such means may be machine readable 
or human readable and may be software that communicates with a printer, 
electronic mail, another computer program, and the like. One particularly 

30 attractive feature of the present invention is that the outputting means may 

communicate directly with software that is part of an oligonucleotide synthesizer. 
In this way the results of the method of the present invention may be used directly 
to provide instruction for the synthesis of the desired oligonucleotides. 
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Another advantage of the present invention is that it may be used to predict 
efficient hybridization oligonucleotides for each of multiple target sequences. 
Thus, very large arrays may be constructed and tested with minimal synthesis of 
oligonucleotides. 



The invention is demonstrated further by the following illustrative 
examples. Parts and percentages are by weight unless otherwise indicated. 
Temperatures are in degrees Centigrade (°C) unless otherwise specified. The 
10 following preparations and examples illustrate the invention but are not 

intended to limit its scope. All reagents used herein were from Amresco, Inc., 
Solon, Ohio (buffers), Pharmacia Biotech, Piscataway, N.J. (nucleoside 
triphosphates) or Promega, Madison, Wisconsin (RNA polymerases) unless 
indicated othenA/ise. 



Synopsis: Data from labeled RNA target hybridizations to surface-bound DNA 
probes directed against 4 different gene sequences were compared to the 

20 predictions of the preferred version of the prediction algorithm illustrated by the 
flow chart in Fig. 2. The RNA targets were sequences derived from the human 
immunodeficiency virus protease-reverse transcriptase region (HIV PRT; sense- 
strand target polynucleotide), human glyceraldehyde-3-phosphate dehydrogenase 
gene (G3PDH; antisense-strand target polynucleotide), human tumor suppressor 

25 p53 gene (p53; antisense-strand target polynucleotide) and rabbit p-globin gene 
(p-globin; antisense-strand target polynucleotide). The GenBank accession 
numbers for the gene sequences, number of data points collected and 
temperature of hybridization have all been previously listed in Table 2. 

30 Materials and Methods : Three different experimental systems and two different 
labeling schemes were used to collect data. 



5 



EXAMPLES 



15 



Example 1 
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The sequence and hybridization data for p-globin were taken from the 
literature (see Milner ef a/.. (1997). supra; in this experinnent, ^^P-radiolabeled 
RNA target was used. 

The hybridization data for HIV PRT were obtained using an Affymetrix 
5 GeneChip™ HIV PRT-sense probe array (i.e. sense strand target polynucleotide) 
(GeneChip™ HIV PRT 440s, Affynnetrix Corporation, Santa Clara, California) as 
specified by the manufacturer, except that the fluorescein-labeled RNA target was 
not fragmented prior to hybridization and that hybridization was performed for 24 
hours. The concentration of fluorescein-labeled RNA used was 26.3 nM; label 
10 density was approximately 18 fluoresceinated uridyl nucleotides per 1 kilobase 
(kb) RNA transcript. The raw data were collected by scanning the array with a 
^ GeneChip™ Scanner 50 (Affymetrix Corporation, Santa Clara, California), as 

j4 specified by the manufacturer. The raw data were reduced to a feature-averaged 

a (".CEL") file, using the GeneChip™ software supplied with the scanner. Finally, a 

S 15 table of hybridization intensities for perfect-complement 20-mer probes was 
constructed using the ASCII feature map file supplied with the GeneChip™ 
^ software to connect probe sequences to measured hybridization intensities. The 

^ resulting data set contained data for every overiapping 20-mer probe to the target 

1^ sequence. 

Q 20 The data for G3PDH and p53 were measured using 93-feature arrays 

^ constructed using commercially available streptavid in-coated microtiter plates 

(Pierce Chemical Company, Rockford, IL). Every tenth possible 25-mer probe 
complementary to each target was synthesized and 3'-biotinylated by a contract 
synthesis vendor (Operon, Inc., Alameda, CA). The 3'-linked biotin was used to 
25 anchor individual probes to microtiter wells, via the well known, strong affinity of 
streptavidin for biotin. Biotinylated DNA probes were resuspended to a 
concentration of 10 pM in hybridization buffer (5x sodium chloride-sodium 
phosphate-disodium ethylenediaminetetraacetate (SSPE), 0,05% Triton X-100, 
filter-sterilized; 1x SSPE is 150 mM sodium chloride, 10 mM sodium phosphate. 1 
30 mM disodium ethylenediaminetetraacetate (EDTA), pH 7.4). Individual probes 
were diluted 1:10 in hybridization buffer into specified wells (100 pi total volume 
per well) of a streptavidin-coated microtiter plate; probes were allowed to bind to 
the covered plates overnight at 35°C. The other 3 wells of the 96-well microtiter 
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plate were probe-less controls. The coated plates were washed with 3 x 200 m' of 
wash buffer (6x SSPE, 0.005% Triton X-100, filter-sterilized). Fluorescein-labeled 
RNA (100 pi of a 10 nM solution in hybridization buffer) was added to each well. 
The plates were covered and hybridized at SS'C for 20-24 hours. The hybridized 
5 plates were washed with 3 x 200 pi of wash buffer. Label was then released in 
each well by adding 100 pi of 20 pg/ml RNAase I (Sigma Chemical Company, St. 
Louis, MO) in Tris-EDTA (TE) (10 mM Tris(hydroxymethyl)aminomethane (Tris), 1 
mM EDTA, pH 8.0, sterile) and incubating at 35°C for at least 30 minutes. The 
fluorescence released from the surface of each well was quantitated with a 
10 PerSeptive Biosystems Cytofluor II microtiter plate fluorimeter (PerSeptive 
Biosystems, Inc., Framingham, MA) using the manufacturer's recommended 
excitation and emission filter sets for fluorescein. Each plate hybridization was 

£; performed in quadruplicate, and the data for each probe were averaged to obtain 

m the hybridization intensity. 

f 1 5 Labeled RNA targets specific for G3PDH and p53 were produced via T7 

'"'^ RNA polymerase transcription of DNA templates in the presence of fluorescein- 

T" UTP (Boehringer Mannheim Corporation, Indianapolis, IN), using the same 

S method as that outlined by Affymetrix for their GeneChip™ HIV PRT sense probe 

|=i array. The DNA template for G3PDH was purchased from a commercial source 

Q 20 (Clontech, Inc., Palo Alto, CA). The DNA template for p53 was obtained by sub- 
H= cloning a PCR fragment from an ATCC-derived reference clone (No. 57254) of 

human p53 into the commercially-available PCR cloning vector pCR2.1-TOPO 
(Invitrogen, Inc., Carlsbad, CA), then linearizing the plasmid at the end of the 
polycloning site opposite the vector-derived T7 promoter. 
25 Probe predictions were performed using a software application (referred to 

as "p5") that was built atop Microsoft's Access relational database application, 
using added Visual Basic modules, the TrueDB Grid Pro 5.0 (Apex Software 
Corporation, Pittsburgh, PA) enhancement to Visual Basic, and a version of the 
FORTRAN application MFOLD, modified to run in a Windows NT 4.0 environment, 
30 as an ActiveX control. The Visual Basic source code for the p5 software 

application is found in the Microfiche appendix to this specification. The DNA 
target sequence complements that were input into p5 for division into potential 
oligonucleotide probe sequences are listed below: 
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Parent Sequence Accession No.: K0325 6 

Locus: BUNGLOB.DNA (portion of rabbit (3 -globin) 

Length : 122 

1 TTCTTCCACA TTCACCTTGC CCCACAGGGC AGTGACCGCA GACTTCTCCT CACTGGACAG 
61 i^TGCACCATT CTGTCTGTTT TGGGGGATTG CAAGTAAACA CAGTTGTGTC AAAAGCAAGT 
121 GT SEQ ID NO: 36 



Parent Sequence Accession No. : Ml 5 65 4 

Locus: HIV__PRTA.S (HIV PRT antisense; parses into probes specific for 
sense-strand target) 
Length: 1040 

1 TGTACTGTCC ATTTATCAGG ATGGAGTTCA TAACCCATCC AAAGGAATGG AGGTTCTTTC 
61 TGATGTTTTT TGTCTGGTGT GGTAAGTCCC CACCTCAACA GATGTTGTCT CAGCTCCTCT 
121 ATTTTTGTTC TATGCTGCCC TATTTCTAAG TCAGATCCTA CATACAAATC ATCCATGTAT 
181 TGATAGATAA CTATGTCTGG ATTTTGTTTT TTAAAAGGCT CTAAGATTTT TGTCATGCTA 
241 CTTTGGAATA TTGCTGGTGA TCCTTTCCAT CCCTGTGGAA GCACATTGTA CTGATATCTA 
301 ATCCCTGGTG TCTCATTGTT TATACTAGGT ATGGTAAATG CAGTATACTT CCTGAAGTCT 
361 TCATCTAAGG GAACTGAAAA ATATGCATCA CCCACATCCA GTACTGTTAC TGATTTTTTC 
4 21 TTTTTTAACC CTGCGGGATG TGGTATTCCT AATTGAACTT CCCAGAAGTC TTGAGTTCTC 
481 TTATTAAGTT CTCTGAAATC TACTAATTTT CTCCATTTAG TACTGTCTTT TTTCTTTATG 
541 GCAAATACTG GAGTATTGTA TGGATTCTCA GGCCCAATTT TTGAAATTTT CCCTTCCTTT 
601 TCCATTTCTG TACAAATTTC TACTAATGCT TTTATTTTTT CTTCTGTCAA TGGCCATTGT 
661 TTAACTTTTG GGCCATCCAT TCCTGGCTTT AATTTTACTG GTACAGTCTC AATAGGGCTA 
721 ATGGGAAAAT TTAAAGTGCA ACCAATCTGA GTCAACAGAT TTCTTCCAAT TATGTTGACA 
781 GGTGTAGGTC CTACTAATAC TGTACCTATA GCTTTATGTC CACAGATTTC TATGAGTATC 
841 TGATCATACT GTCTTACTTT GATAAAACCT CCAATTCCCC CTATCATTTT TGGTTTCCAT 
901 CTTCCTGGCA AACTCATTTC TTCTAATACT GTATCATCTG CTCCTGTATC TAATAGAGCT 
961 TCCTTTAGTT GCCCCCCTAT CTTTATTGTG ACGAGGGGTC GTTGCCAAAG AGTGATCTGA 
1021 GGGAAGTTAA AGGATACAGT SEQ ID NO: 37 



Parent Sequence Accession No.: X01677 
Locus: G3PDH (Clontech G3PDH template 
antisense-strand target) 
Length: 999 

1 GAAGGTCGGA GTCAACGGAT TTGGTCGTAT 

61 CTCTGGTAAA GTGGATATTG TTGCCATCAA 

121 TTACATGTTC CAATATGATT CCACCCATGG 

181 CGGGAAGCTT GTCATCAATG GAAATCCCAT 

241 AATCAAGTGG GGCGATGCTG GCGCTGAGTA 

301 CATGGAGAAG GCTGGGGCTC ATTTGCAGGG 

3 61 CTCTGCTGAT GCCCCCATGT TCGTCATGGG 

421 CAAGATCATC AGCAATGCCT CCTGCACCAC 

481 CCATGACAAC TTTGGTATCG TGGAAGGACT 

541 CCAGAAGACT GTGGATGGCC CCTCCGGGAA 

601 GAACATCATC CCTGCCTCTA CTGGCGCTGC 

661 AGACGGGAAG CTCACTGGCA TGGCCTTCCG 

721 CCTGACCTGC CGTCTAGAAA AACCTGCCAA 

781 GGCGTCGGAG GGCCCCCTCA AAGGCATCCT 

841 TGACTTCAAC AGCGACACCC ACTCCTCCAC 

901 CGACCACTTT GTCAAGCTCA TTTCCTGGTA 

961 GGTGGACCTC ATGGCCCACA TGCTATAGTG 



- parses into probes specific for 



TGGGCGCCTG GTCACCAGGG CTGCTTTTAA 
TGACCCCTTC ATTGACCTCA ACTACATGGT 
CAAATTCCAT GGCACCGTCA AGGCTGAGAA 
CACCATCTTC CAGGAGCGAG ATCCCTCCAA 
CGTCGTGGAG TCCACTGGCG TCTTCACCAC 
GGGAGCCAAA AGGGTCATCA TCTCTGCCCC 
TGTGAACCAT GAGAAGTATG ACAACAGCCT 
CAACTGCTTA GCACCCCTGG CCAAGGTCAT 
CATGACCACA GTCCATGCCA TCACTGCCAC 
ACTGTGGCGT GATGGCCGCG GGGCTCTCCA 
CAAGGCTGTG GGCAAGGTCA TCCCTGAGCT 
TGTCCCCACT GCCAACGTGT CAGTGGTGGA 
ATATGATGAC ATCAAGAAGG TGGTGAAGCA 
GGGCTACACT GAGCACCAGG TGGTCTCCTC 
CTTTGACGCT GGGGCTGGCA TTGCCCTCAA 
TGACAACGAA TTTGGCTACA GCAACAGGGT 
AGTCGTATT SEQ ID NO: 38 
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Parent Sequence Accession No. : X54156 

liocus: HSP53PCRa (p53 template - parses into probes specific for 
antisense-strand target) 
Length: 104 9 

1 GAGGTGCGTG TTTGTGCCTG TCCTGGGAGA GACCGGCGCA CAGAGGAAGA GAATCTCCGC 

61 AAGAAAGGGG AGCCTCACCA CGAGCTGCCC CCAGGGAGCA CTAAGCGAGC ACTGCCCAAC 

121 AACACCAGCT CCTCTCCCCA GCCAAAGAAG AAACCACTGG ATGGAGAATA TTTCACCCTT 

181 CAGATCCGTG GGCGTGAGCG CTTCGAGATG TTCCGAGAGC TGAATGAGGC CTTGGAACTC 

241 AAGGATGCCC AGGCTGGGAA GGAGCCAGGG GGGAGCAGGG CTCACTCCAG CCACCTGAAG 

301 TCCAAAAAGG GTCAGTCTAC CTCCCGCCAT AAAAAACTCA TGTTCAAGAC AGAAGGGCCT 

361 GACTCAGACT GACATTCTCC ACTTCTTGTT CCCCACTGAC AGCCTCCCTC CCCCATCTCT 

421 CCCTCCCCTG CCATTTTGGG TTTTGGGTCT TTGAACCCTT GCTTGCAATA GGTGTGCGTC 

481 AGAAGCACCC AGGACTTCCA TTTGCTTTGT CCCGGGGCTC CACTGAACAA GTTGGCCTGC 

541 ACTGGTGTTT TGTTGTGGGG AGGAGGATGG GGAGTAGGAC ATACCAGCTT AGATTTTAAG 

601 GTTTTTACTG TGAGGGATGT TTGGGAGATG TAAGAAATGT TCTTGCAGTT AAGGGTTAGT 

661 TTACAATCAG CCACATTCTA GGTAGGTAGG GGCCCACTTC ACCGTACTAA CCAGGGAAGC 

721 TGTCCCTCAT GTTGAATTTT CTCTAACTTC AAGGCCCATA TCTGTGAAAT GCTGGCATTT 

781 GCACCTACCT CACAGAGTGC ATTGTGAGGG TTAATGAAAT AATGTACATC TGGCCTTGAA 

841 ACCACCTTTT ATTACATGGG GTCTAAAACT TGACCCCCTT GAGGGTGCCT GTTCCCTCTC 

901 CCTCTCCCTG TTGGCTGGTG GGTTGGTAGT TTCTACAGTT GGGCAGCTGG TTAGGTAGAG 

961 GGAGTTGTCA AGTCTTGCTG GCCCAGCCAA ACCCTGTCTG ACAACCTCTT GGTCGACCTT 
1021 AGTACCTAAA AGGAAATCTC ACCCCATCC SEQ ID NO: 39 

The sequences indicated above, which are complennents of the target 
sequences, were divided into overlapping oligonucleotide sequences with one 
nucleotide between starting positions. The oligonucleotide sequence lengths 
were 17 (rabbit p-globin), 20 (HIV PRT) or 25 (G3PDH; p53). The oligonucleotide 
sequence lengths were dictated by the probe lengths used in the experiments to 
which the predictions were compared. The RNA target concentrations used to 
calculate predicted RNA/DNA duplex melting temperatures were 100 pM (rabbit p- 
globin), 26.3 nM (HIV PRT) and 10 nM (G3PDH; p53). These were also dictated 
by experimental conditions for the comparison data. The cut-off filter used for the 
predicted free energy of the most stable probe sequence intramolecular structure, 
AGmfold, was 

kcal 



mole 



The filter condition used for the predicted RNA/DNA duplex melting temperature 
was 



25'^C<7:„-hl6.61og([A^a^])-r,^, <50°C, 
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where Tm is the target concentration-dependent value of the predicted RNA/DNA 
duplex melting temperature before correction for salt concentration, the term "16.6 
log([Na'^])" corrects the melting temperature for salt effects, and Thyb is the 
hybridization temperature. The values of the salt correction term and Thyb have 
5 already been listed in Table 2. For convenient use within p5, the above condition 
was algebraically rearranged into the equivalent form 

25°C-16.61og([7Van)+r,^, <T„, < SO^'C-ie.e log([AAa^])-hr,,,. 

Clusters were ranked according to the number of contiguous oligonucleotide 
10 sequences that passed through the filter set ("contig" length). 

Results: The detailed analysis results for rabbit p-globin are presented in Table 3; 
a graphical summary of the results is shown in Fig. 4. In Table 3, values of Tm 
and AGmfold that were excluded by the filter set are shown with a line through 
15 them, and table entries for contig length are shown in gray when the 

oligonucleotide sequence in question was not in a contig. The top 20% of the 
obsenyed hybridization intensities are shown underlined. 



Attorney Docket No. 10971464-1 



Table 3 



Position 


Oligonucleotide 
Sequence 


SEQ ID 


TrnfC) 


AGmfold 
^Kcai/moie/ 


Contig 
1 pnnth 


riyoriaizsiion miensiiy 
fMilner a/ 1997) 


1 


TTCTTCCACATTCACCT 


4 0 


53 62 


o.uu 




inn 


2 


TCTTCCACATTCACCTT 


4 1 


53 62 


o.uu 






3 


CTTCCACATTCACCTTG 


42 


52 1 9 


n on 
u.yu 




1 ou 


4 


TTCCACATTCACCTTGC 


43 


5^ 50 


n t;n 
u.ou 




9nn 


5 


TCCACATTCACCTTGCC 


44 


CO AG. 

00.4D 


n f^n 
u.ou 


7 


i9n 


6 


CCACATTCACCTTGCCC 


45 




n *^n 
u.ou 


7 




7 


CACATTCACCTTGCCCC 


46 


Dl .1 U 


n '%n 
u.ou 


7 




8 


ACATTCACCTTGCCCCA 


47 


ol.lU 


u.ou 


7 




9 


CATTCACCTTGCCCCAC 


48 


61 .10 


n on 

U.yu 


7 


O^w 


10 


ATTCACCTTGCCCCACA 


49 


^A A r\ 


n Tn 
U. / U 


7 
f 


O 1 u 


11 


TTCACCTTGCCCCACAG 


50 


61 .33 


U.OU 


7 


'^90 


12 


TCACCTTGCCCCACAGG 


51 


DO. 70 


A fin 
■U.ou 




390 


13 


CACCTTGCCCCACAGGG 


52 


d4.oO 


1 .DU 




410 


14 


ACCTTGCCCCACAGGGC 


53 


f\A 

68.01 


y| ^ Q 




940 


15 


CCTTGCCCCACAGGGCA 


54 


bo. DO 






50 


16 


CTTGCCCCACAGGGCAG 


55 


D4.yo 


■ u.ou 




90 


17 


TTGCCCCACAGGGCAGT 


56 


66.31 


R ftn 




90 


18 


TGCCCCACAGGGCAGTG 


57 


DO./ y 






20 


19 


GCCCCACAGGGCAGTGA 


58 


Of. Of 


/I A n 




90 


20 


CCCCACAGGGCAGTGAC 


59 


D0.4Z 


1 tOU 




40 


21 


CCCACAGGGCAGTGACC 


60 


D0.4Z 


■ 1 .mW 




90 


22 


CCACAGGGCAGTGACCG 


61 


en QC 

oy.oo 


i >1 A 
- 1 .f\\^ 




20 


23 


CACAGGGCAGTGACCGC 


62 


bU. 14 


A AA 
— »s UU 




20 


24 


ACAGGGCAGTGACCGCA 


63 


Gf\ A A 

60.14 


A RA 
■U.OU 




90 


25 


CAGGGCAGTGACCGCAG 


64 


oy. /b 


n CA 
■ U.OU 




30 


26 


AGGGCAGTGACCGCAGA 


65 


59.00 


A e^A 
U.OU 




90 


27 


GGGCAGTGACCGCAGAC 


66 




A CA 

■ U ■ PU 




30 


28 


GGCAGTGACCGCAGACT 


67 


oy.oo 


Q gQ 




30 


29 


GCAGTGACCGCAGACTT 


68 


Of.UO 


Q >|Q 




30 


30 


CAGTGACCGCAGACTTC 


69 


53 QQ 


A >tA 




40 


31 


AGTGACCGCAGACTTCT 


70 




A on 
-u.zu 




40 






71 


55.99 


0.60 


7 


100 


33 


TGACCGCAGACTTCTCC 


72 


57.01 


0.60 


7 


120 


34 


GACCGCAGACTTCTCCT 


73 


59.22 


0.60 


7 


180 


35 


ACCGCAGACTTCTCCTC 


74 


59.28 


0.60 


7 


210 


36 


CCGCAGACTTCTCCTCA 


75 


60.07 


0.60 


7 


200 


37 


CGCAGACTTCTCCTCAC 


76 


56.34 


0.60 


7 


190 


38 


GCAGACTTCTCCTCACT 


77 


57.79 


0.6C 


7 


240 


39 


CAGACTTCTCCTCACTG 


78 


go 9; 


t 0.6C 




240 1 


40 


AGACTTCTCCTCACTGG 


79 




O.OC 


1 


340 
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"(JolLIUl 1 










Contig 




Hvbridization Intensitv 






Sequence 


NO: 




(kcal/mole) 


Length 




(Milneref a/., 1997) 




41 


GACTTCTCCTCACTGGA 


80 


55.77 






340 




42 


ACTTCTCCTCACTGGAC 


81 


5>l 35 






240 




43 


CTTCTCCTCACTGGACA 


82 


55.75 






240 




44 


TTCTCCTCACTGGACAG 


83 


53 55 


1 50 




120 




45 


TCTCCTCACTGGACAGA 


84 




-4^ 




100 




46 


CTCCTCACTGGACAGAT 


85 


53 35 


-4t^ 




110 




47 


TCCTCACTGGACAGATG 


86 


51.10 






80 




48 


rCTCACTGGACAGATGC 


87 


5>^ 25 


0.00 




240 




49 


CTCACTGGACAGATGCA 


88 


5^ 26 


0.20 




90 




50 


T C AC T GG AC AG AT GC AC 


89 


j^Q 53 


0.20 




30 




51 


CACTGGACAGATGCACC 


90 


52.7^ 


0.50 




100 


%J 


52 


APTGGACAGATGCACCA 


91 


52,7>i 






80 


fi 


53 


CTGGACAGATGCACCAT 


92 


52/1 8 






90 




54 


TGGArAGATGCArCATT 


93 


5Q 39 






80 


=^ 
._ 


55 


GGACAGATGCACCATTC 


94 


51.75 


0.30 




180 




56 


GACAGATGCACCATTCT 


95 


51 05 


-0.10 




220 




57 


ACAQATGCACCATTCTG 


96 


55 


-1 30 




120 




58 


CAGATGCACCATTCTGT 


97 


52 19 


2/10 




120 


=p 


59 


AGATGCACCATTCTGTC 


98 


52,05 


-0.10 




250 




60 


GATGCACCATTCTGTCT 


99 


5>l/] 3 


0.30 




520 




61 


ATGCACCATTCTGTCTG 


100 


52.50 


0.40 




980 




62 


TGCACCATTCTGTCTGT 


101 


56.05 


0.20 


2 


780 




63 


GCACCATTCTGTCTGTT 


102 


56.52 


0.20 


2 


810 




64 


CACCATTCTGTCTGTTT 

V*^XiV-^ X X \^ X \J X X XXX 


103 


52 06 


0.20 




220 




65 


ACCATTCTGTCTGTTTT 


104 


50 33 


0.20 




120 




66 


CCATTCTGTCTGTTTTG 


105 


50/1 3 


0.20 




120 




67 


CATTCTGTCTGTTTTGG 


106 


>^3 /|2 


0.60 




160 




68 


ATTCTGTCTGTTTTGGG 


107 


/19.91 


1.70 




310 




69 


TTCTGTCTGTTTTGGGG 

X X \^ X \J X ^ri' X \J X X X X ^H' Nh' 


108 


53 -jO 


1.70 




250 




70 


TCTGTCTGTTTTGGGGG 


109 


55.90 


1.70 


2 


80 




71 


CTGTCTGTTTTGGGGGA 


110 


55.91 


1.40 


2 


30 




72 


TGTCTGTTTTGGGGGAT 


111 


53 55 


0.90 




50 




73 


GTCTGTTTTGGGGGATT 


112 


5/j 00 


0.90 




10 




74 


TCTGTTTTGGGGGATTG 


113 


50 50 


1.10 




10 




75 


CTGTTTTGGGGGATTGC 


114 


53 77 


2.20 




10 




76 


TGTTTTGGGGGATTGCA 


115 


53 0>1 


1.20 




10 




77 


GTTTTGGGGGATTGCAA 


116 


51 01 


0.00 




5 




78 


TTTTGGGGGATTGCAAG 


117 




-0.20 




5 




79 


TTTGGGGGATTGCAAGT 


118 


50 30 


-0.20 




5 




80 


TTGGGGGATTGCAAGTA 


119 


>^9 30 


0.00 




5 
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nOSiuon 


Sequence 


NO: 


' m V ^} 


(kcal/mole) 


Contia 
Length 


Hvbridization Intensity 
(Milneref a/., 1997) 


81 


TnCGGGATTGCAAGTAA 


120 




1,20 




5 


82 


GGGGGATTGCAAGTAAA 


121 


/15.76 


1.40 




5 


83 


GG GGAT T G C AAG T AAAC 


122 


j\2 5^ 


1.40 




5 


84 


GGG AT T G C AAG T AAAC A 


123 


^2 32 


1.30 




5 


86 


GGAT TGCAAGT AAAC AC 


124 


/)0.11 


0.90 




5 


86 


GATTGCAAGTAAACACA 


125 


33 


0.50 




5 


87 


AT T G C AAG T AAAC AC AG 


126 




0,50 




5 


88 


T T G C AAG T AAAC AC AG T 


127 


/|Q 35 


0.50 




5 


89 


T G C AAG T AAAC AC AG T T 


128 


^0.35 


0.30 




5 


90 


GCAAGTAAACACAGTTG 


129 


35 


0.10 




10 


91 


CAAGTAAACACAGTTGT 


130 


33 93 


-0,30 




5 




AAHT AA APAPA(^TTGTG 


131 


37 y^Q 


-Q QQ 




5 






132 








5 


94 


f^TAAAPAPAGTTGTGTC 


133 


>j3/l 5 


2,50 




5 




TAAAPACAGTTGTGTCA 


134 


73 


2,50 




5 


96 


AAAPAPAGTTGTGTCAA 


135 








5 


97 


AACACAGTTGTGTCAAA 


136 


>jQ 57 






5 


98 


ACACAGTTGTGTCAAAA 


137 


>I0 57 


2 3Q 




10 


99 


CACAGTTGTGTCAAAAG 


138 


>IQ 20 






15 


100 


ACAGTTGTGTCAAAAGC 


139 


^2.93 


Q 50 




30 


101 


CAGTTGTGTCAAAAGCA 


140 


>|3 9Q 


0.20 




25 


102 


AGTTGTGTCAAAAGCAA 


141 


40^ 


-0.10 




25 


103 


GTTGTGTCAAAAGCAAG 


142 


>I0 57 


-0.30 




20 


104 


TTGTGTCAAAAGCAAGT 


143 


57 


-0.10 
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144 
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20 



In Fig. 4, the hybridization intensity observed experimentally is plotted as a 
function of oligonucleotide starting position in the target-complementary sequence 
that was input into p5. The identified contigs are plotted as horizontal bars, with 
the contig rank (by length) shown in parentheses next to each bar. It is clear from 
Table 3 and Fig. 4 that the prediction algorithm identified contigs that overlap all of 
the "top 20%" hybridization intensity peaks observed. Iterative experimental 
improvement of these predictions would converge on each of the observed 
intensity maxima in 3-4 iterations. 

Prediction worksheets for HIV PRT, G3PDH and p53 were prepared in a 
manner similar to that for rabbit p-globin as shown in Table 3, except that the 
probes were longer as indicated above and that approximately 1 ,000 probes were 
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analyzed for each of these genes. The results of these analyses are shown in 
Fig. 5 (HIV PRT), Fig. 6 (G3PDH) and Fig. 7 (p53). In Fig. 5, data are plotted for 
all possible 20-mer oligonucleotide probes. In Figs. 6 and 7, data were available 
for only every lO"' 25-mer probe, and the actual data points are plotted as open 
5 diamonds. 

It is clear from Figs. 5-7 that the hybridization efficiency prediction algorithm of the 
present invention performed well in the task of identifying regions with observed 
high hybridization intensity. In each case, the 4 longest contigs point to good-to- 
excellent regions for experimental investigation. It should be noted that the 

10 contigs usually bracket observed intensity peaks; experimental iterative 

refinement would therefore be expected to converge in 2-3 iterations. By this is 
meant that certain oligonucleotides from the identified contigs are prepared and 
subjected to evaluation in actual hybridization experiments. Based on the results 
of such experiments, the observed signal is evaluated to determine whether the 

1 5 oligonucleotides are hybridizing to the left of, the right of, or on the center of a 
peak with respect to the graphed data. The next iteration is carried out to 
experimentally evaluate the hybridization efficiency of probes that are inferred to 
lie closer to the peak of hybridization efficiency, based on the data from the 
previous iteration. Iteration is continued until the signal level is deemed 

20 acceptable by the user, or the local hybridization efficiency maximum is reached 
(i.e. the best probe in the cluster identified by the method of the current invention 
has been experimentally identified). A detailed illustration of this process is shown 
in Example 3. 

25 It should be noted that clusters of predictions that overlap the maxima of observed 
peaks of hybridization efficiency will often yield user-acceptable probes on the first 
iteration. Thus, the method of the present invention is much more efficient than 
current methods in which every potential probe is synthesized. For instance, in 
the HIV PRT example shown in Fig. 5, at least 3 good probes would be identified 

30 after synthesis of -10 test probes (i.e. statistical sampling of the 3 longest 

contigs). This is much more efficient than the ~1 ,000 probes represented by the 
data in Fig. 5. 
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Example 2 

Synopsis : Data from a labeled RNA target hybridization to an Affymetrix 
GeneChip™ HIV PRT-sense probe array (GeneChip™ HIV PRT 440s, Affymetrix 
5 Corporation, Santa Clara, CA) were compared to the predictions of the window- 
averaged composite dimensionless score version of the method of the present 
invention. 

Materials and Methods : Data were obtained as described for the Affymetrix 

10 GeneChip™ HIV PRT-sense probe array (GeneChip™ HIV PRT 440s, Affymetrix 
Corporation, Santa Clara, California) in Example 1. The DNA sequence (SEQ ID 
NO: 37) complementary to the fluorescein-labeled RNA target was divided into 
overlapping 20-mer oligonucleotide sequences spaced one nucleotide apart, 
using the prototype application p5; p5 was also used to calculate the predicted 

15 values of the RNA/DNA heteroduplex melting temperature (Tm) and the free 

energy of the most stable predicted probe intramolecular structure, AGmfold, as 
described in Example 1. The probe sequences and parameter values were then 
transferred to a Microsoft Excel spreadsheet, which was used to complete the 
predictions of efficient and inefficient probes. The weight was obtained by 

20 optimizing the performance of the algorithm with the data of Milner et al., supra, as 
the training data using the Microsoft® Excel® spreadsheet software. The 
composite score was calculated using a weight of 0.62 for the dimensionless Tm 
score and a weight of 0.38 for the AGmfold dimensionless score. The windowed- 
averaging was performed using a window width of 7 and Microsoft® Excel® 

25 spreadsheet software. Finally, the oligonucleotide sequences having the top 10% 
of the window-averaged composite dimensionless scores were predicted to be 
efficient probes, while the oligonucleotide sequences having the bottom 10% of 
the window-averaged composite dimensionless scores were predicted to be 
inefficient probes. 

30 

Results : The calculated parameters and scores are shown in Table 4; the 
algorithm predictions are also shown diagrammatically in Figure 8. In Table 4, 
window-averaged composite score values that were in the top 10% of the 
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distribution of values are shown in bold type, values that were in the bottom 10% 
are shown in italics, and all other values are shown with a line through them. It is 
clear from both Table 4 and Figure 8 that the window-averaged composite 
dimensionless score embodiment of the current invention correctly predicted both 
efficient and inefficient hybridization probes for HIV PRT sense-strand RNA. As in 
Example 1, statistical sampling of contiguous stretches of predicted "good" probes 
would lead to convergence of the design process to the best probes in each 
region in 2-4 design iterations. 



Attorney Docket No. 10971464-1 



It 



4 



hi 
o 



> © o 
o 



o 



CD 



u 

CO 



■5 S2 



o 




(A a> 

O 

o 



o 

O iT 



I— o 

CO 



So o 

CO *^ 



©) 



O 



00 
LU Z 
CO 



0> 

o 
c 

0) 
3 

o- 

CO 

S 

Q. 
< 



*? 



co 



CO 
CO 



to 



00 

CO 

o 



CM 
CO 



00 

CO 
CM 



CO 
CO 



h- 



CD 
CO 



o 
o 

05 



CO 
10 



u 
o 
Q 

E 
o 



CI 

Ci 



C» 
▼ ■ 



GO 
CZ) 



CO 



CD 

CM 
O 



CO 
CO 

in 



CM 



10 
IT) 



in 
10 



CM 

o 



10 

C3) 
00 

CD 



CO 
00 



CM 
CD 

o 



CM 
00 
Oi 



CO 

00 

CM 



o 



o 

CO 
CO 



CM 
CD 
CM 



O 
h- 

C3 



CO 



O 
CO 



CO 
CM 



o 

CD 



o 

CD 
ci 



o 

CM 
CD 



O- 
CD 



O 
CO 

CO 



o 
o 

CM 



O 
CD 
C3 



CD 

CO 
CO 



10 



to 



CO 
CO 



CO 

to 

a> 
10 



00 
10 

00 
to 



CO 

to 



CD 



tO 
CO 
CO 



CM 

CM 
CD 



CM 
CD 

CD 



CO 

ai 

CO 



CO 



CO 



CO 
CO 



CO 



o 
in 



CO 

m 



m 
in 



in 



in 



eg 



in 



cn 



8 

C5 



6 



I 

I 



u 
u 

i 

H 

i 

O 

I 



6 

u 
u 

I 

i 
i 



U 

g 

u 
u 

i 

1: 

CD 



to 



CO 



CD 



00 



o 

CN 



CN 
CM 



CM 



CO 
CM 



CD 

o 



2:5 IS 

> O Q 
O 



• T3 ^ 

.E « E " 



(0 o 

o 



o 

o £: 



01 

- o 

CO 



-9? 

CD 



a> 

So O 

O p o 

=1 o 



® 



< _ 



oo 

UJ z 
CO 

a> 
u 
c 

3 

cr 
a> 
CO 
a> 

2 

Q. 
< 



S c 

O.S 
Q. a> 
«> I? 



T CI 

C 

Cf Ci 



CM 



CM 



CO 



CO 



CD CM 



CO 



CO O 
CM 



o 
o 
O 



CO 
■r- 

o 



5 J> « <D 

5 « o o 



0) 
JD 
CO 



? -c 12 

> g Q 
X g 

o 



CO 0> 

g.5 

o 



o 

-I m 
O £ 



CO 



3 o O 



< _ 



oo 

UJ z 
cn 



o 
c 

0) 
3 

o- 

0) 
CO 
0) 

o 
< 



•g.2 

CL (0 

«> I? 



CO 



CO 



CO 



CO 



00 



CO 



o 

CO 



csi 
oo 

CO 



CO 
00 



CD 

o 



CO 
CM 



O 

CO 
CD 



CO 
LO 



CO 



CO 
CO 
CM 

CM* 



CO 
OO 



CO 

to 



m 

CO 



lO 

o 



CM 



o 



o 



CM 



o 
to 

CO 



o 

lO 
CM 



o 

lO 
CN 



o 



CM 



oo 



to 
o 



a> 

CO 



CM 
CSI 
CM 



CJ 
O 

6 

U 
O 
CJ 

i 

a 

H 



U 
U 



o 

CO 



o 
o 

c 




0) 

o 
o 
Q 



HIV PRT 
GeneChip™ 
Data 


1213.6 


1106.1 


1009.0 


1656.2 


2178.3 


2567.01 


3000.5 


2025.4 


429.2 


157.9 


135.3 


330.8 


900.0 


1177.0 


795.1 


889.2 


1703.6 


3115.2 


4445.0 


6762.8 


8845.0 


9010.6 


19941.0 


12577.0 


7503.3 


7033.8 


8276,7 


Window- 
Averaged 
Composite 
Score 






1 0.872 


0.908 












c 
c 


i 






c 
^ 
o 

c 


> 




^ 
c 


! 








0.847 


o 
o 


CM 
CM 


CO 

u> 

CM 


o 

CO 


CM 




oo 
o> 
o 


CM 
O 


Composite 
Score 


0.772 


1.037 


0,874 


0.788 


0.970 


1.132 


0,779 


0.240 


-0.030 


-0.223 


-0.387 


-0.363 


-0.303 


-0.639 


-0.428 


-0.249 


0.272 


0.889 


CO 

CM 
O 




o> 

CO 

in 


CO 


CO 


00 
CO 
CO 


c> 


C) 


CN 


o 

o S: 
= o 

< 


-0.547 


0.670 


j 0.670 


0.323 


0.323 


0.062 


-0.199 


-1.243 


-1.851 


-1.851 


-1.851 


-1.851 


-1.851 


-1.851 


-1.851 


-1.504 


-0.199 


-0.112 


-0.112 


0.062 


0.670 


0.583 


0.583! 


0.583 


0.583 


0.583 


0.583 


CO 


1.580 


1.262 


6660 


1.073 


1.366 


1.789 


1.379 


1.148 


1.087 


0.775 


0.511 


0.549 


0.646 


0.104 


0.444 


0.520 


0.560 


1.503 


1.719! 


1.803! 


2.071 


1.763 


1.763 


1.849 


1.369 


1.369 


1.451 


3 o o 

rn CO *^ 


-0.50 


0.90 


0.90 


0.50 


0.50 


0.20 


-0.10 


-1.30 


-2.00 


-2.00 


-2.00 


-2.00 


-2.00 


-2.00 


-2.00 


-1.60 


-0.10 


0.00 


0.00 


0.20 


0.90 


0.80 


0.80 


0.80 


0.80 


0.80 


0.80 


< _ 


71.14 


68.97 


67.18 


67.68 


69.68 


72.56 


69.77 


68.19 


67.78 


65.65 


63.85 


64.11 


64.77 


61.08 


63.40 


63.91 


64.19 


70.61 


72,081 


72.66 


74.49 


72.38 


72.38 


72,97 


69.70 


69.70 


70.26 


SEQ ID 
NO: 


in 
eg 

CM 


CN 
CM 


LZZ 


oo 

CM 
CM 


cn 

CM 
CM 


o 
ro 

CM 


1— f 

m 

CM 


CM 
ro 
CM 


CO 

ro 

CM 


ro 

CM 


U-) 

ro 
CM 


V£) 

ro 

CN 


ro 

CM 


oo 
ro 

CM 


<y\ 
ro 

CM 


o 

CM 


i-H 

CM 


Csl 
CM 


ro 
CM 


CM 


CM 


CM 


CM 


00 
CM 


CTi 
"'^ 
CM 


o 
in 

CM 


1—1 

in 

CM 


DNA Probe Sequence 


GGTAAGTCCCCACCTCAACA 


GTAAGTCCCCACCTCAACAG 


TAAGTCCCCACCTCAACAGA 


AAGTCCCCACCTCAACAGAT 


AGTCCCCACCTCAACAGATG 


GTCCCCACCTCAACAGATGT 


TCCCCACCTCAACAGATGTT 


CCCCACCTCAACAGATGTTG 


CCCACCTCAACAGATGTTGT 


CCACCTCAACAGATGTTGTC 


CACCTCAACAGATGTTGTCT 


ACCTCAACAGATGTTGTCTC 


CCTCAACAGATGTTGTCTCA 


CTCAACAGATGTTGTCTCAG 


TCAACAGATGTTGTCTCAGC 


CAACAGATGTTGTCTCAGCT 


AACAGATGTTGTCTCAGCTC 


ACAGATGTTGTCTCAGCTCC 


CAGATGTTGTCTCAGCTCCT 


AGATGTTGTCTCAGCTCCTC 


GATGTTGTCTCAGCTCCTCT 


ATGTTGTCTCAGCTCCTCTA 


TGTTGTCTCAGCTCCTCTAT 


GTTGTCTCAGCTCCTCTATT 


TTGTCTCAGCTCCTCTATTT 


TGTCTCAGCTCCTCTATTTT 


GTCTCAGCTCCTCTATTTTT 


p5 Probe 
Position 


CO 


CM 
GO 


CO 
OO 


CO 


ID 
00 


CO 
CO 


co 


CO 
CO 


a> 

GO 


o 

Oi 




CN 

cn 


CO 


O) 


lO 
CD 


CO 


r«- 


CO 
CD 


Oi 


o 
o 


5 


CM 
O 


m 


o 


in 
o 


CO 

o 


o 



< 



i 



CD 

o 



> g Q 
X g 
CD 



c c " 

><cS 



SI 
CO 



CO o 
O *r 

o 



E O 
" O 
CO 



So o 

O C o 



< _ 

15 



9 

UJ z 
CO 



o 
c 

0) 
3 
O" 
0> 
CO 
01 

2 

Q. 



C 

si 

CL in 

Q.ft- 



o o 

CO CO 



CM 



CM 



CM 
CO 



00 
CO 

o 

CO 



CO 

CO 

CM 
lO 
CM 



o 

CD 

o 



<3) oo 

to lO 

CM o 

o o 



CO 

cn 



lO 

CM 



CM 

oo 

CM 



oo 



CD 

oo 



oo 

CD 
oo 
CM 



CO 

od 
o 

CO 



o 



5 



CM 

oo 



00 



CO 

oo 



CO 



CO 
CO 
CD 

o 



o 

CO 

o 



CM 



CD 
CD 



o 



CM 



o 

CM 



00 
CO 



CD 
CD 
CO 



C3) 
OO 

to 



CO 
OO 

to 



o 

CO 



CO 



CO CD CO 



CM 
CO 

00* 

CO 



CO 

oo 
oo 



CO 

o 

CO 



GO 
CO 



CO 



CM 

<o 



05 
CO 

CD 
CD 



O 
OO 

CD 



ID 
CM 
Cji 
CO 



eg 

CM 



in 

CM 



CM 
CM 



in 



u 
o 

I 

v. 

C!) 
H 



U 

o 
o 

u 

i 

U 
H 



u 

CJ 

u 

CJ 



I 

ij 

H 

2 



J 
CJ 
CJ 

o 

U 

o 



U 

H 
O 



CJ 



l! 

CJ 

u 

I 



C3 



P 



CO 



o 

CM 



CN 



CM 
CM 



O 

CO 



CO 



I 



u 
o 
o 

CJ 
E-* 
U 
C 

S. 

CM 
CO 



a 

E- 

CJ 
H 

U 

u 

CJ 

o 



i 



u 

CJ 

o 



C5 



CP 
CO 



0) 

o 
o 
Q 

>^ 
a> 
c 



* 



CD 

o 



DC V ns 
X g 



O O ir 
■o 2 Q. o 

< o 



CD 



o 



o 

<1> 
O w 



- o 
CO 



0> ^ 

3 o o 

to 



< _ 



ao 

UJ z 
CO 



c 

CO 

o 
o 
a. 
< 



Q. (A 



CO 

to 

CO 



oo 

^* 
o 

CD 



CO 
CO 

cn 

CO 



Oi 

00 
CD 
CO 
CM 



O 
CD 

lO 

CO 



CD 
O 
CD 



OO 

o 

CD 
CNJ 



CO 

CD 
CO 



OO 
OO 

o 



CI 

c:i 



C£> 



C^ 



cr > 
Ci 



o 

CM 



o 

CM 



O 
CO 
O 



CM 



CD 
CO 



1^ 

CD 



CO 
O 



OO 
CO 
O 



OO 
CM 



O 
ID 



O 

in 



o 

OO 



o 

CO 



CD 



CD 



in 
o 

CD 
CD 



lO 



05 
ID 

00 



to 
to 

iO 



o 

OO 
ID 



OO 

in 



CO 

CO 

CD 

in 



CM 

CO 

CD 

in 



CD 
CSJ 



OO 
CD 



O 
CM 



CM 



in 

CM 



CTi 



CM 
O O 

ro ro 



u 
u 

i 

i 



i 

H 
O 
U 

5 
g 

1 

o 



i 

u 
o 

i 



O 



CO 



CD 



CO 



ay 



o 
in 



CM 

m 



CO 

m 



in 



CO 

o 

GO 



o 



o 

CO 



o 

CO 

m 



IT) 
O 



CD 



o 
o 
Q 

E 
o 



CD 



O 



^ <5 O 



21 



S 
I- 

«»-o IS 
> 2 Q 
X g 



o 

O £ 



0) 
CO 



E O 
- O 
CO 



Sou 

O C o 



O 

o 



9 

iij z 
CO 



0> 
U 

c 

cr 

<3> 
CO 

s 

Q. 
< 



0.2 
Q. (0 
I? 



o> CSJ 



00 
o o 



S 



o o 

O (N 



CO a> 
CO o> 

CM X- 



oo 



CO 
CO 



CM 
CO 



up 



CI 



in 

CO 

LO 



CN 

o 



o 

CO 

o 

CD 



CM 



XT 

CO 
CO 

o 



o 

CD 



o 

00 



o 



CM 

to 



CO 

o 

CO 



o 

CO 



eg 
m 

CO 



8 

u 

Eh 



o 
o 

i 



o 
u 



CO 
CO 



0) 
_^ 

o 

o 
Q 

>» 

(D 

c 
o 

5 



to 
o 



JQ 
CD 




o 
o 
Q 
>* 

CD 

E 
o 

< 



4 



CO 

o 



a « 

> g Q 

O 



CO <D 



£ m O 

c o c: 

< o 



o ^ 

CL O 

o 



CO 



o 

O 



to 
to 

CN 
O 



o 



co 

lO 
CO 

in 



CO 
00 

5 



CM 



CO 
C7) 
CO 

o 
in 



o 
CO 



CM 



CO 
CD 
CO 
CM 



CM 



oo 

CO 



00 

o 

CO 



ct 

U t 

c> 



Ci 



o 

CD 
CM 

o 



o 

CD 



oo 



o 

CO 

o 



CO 



to 
o 



CO 

oo 



o 



CD 
CM 

o 



C3) 
CO 
CO 



CO 

in 
o 



CO 

cb 



s O 



CO 
oo 

to 



CO 
OO 

in 



C3> 



o 
co 

CD 



00 



CD 
CO 
CO 



CO 
iO 



o 
- o 

C/3 



CD 

o 



OO 
CM 

o 



CD 
CO 



o 

CD 

o 



o 
o 

CM 

o 



o 
o 

CD 



CO 



So o 

O C o 

^1 w 



o 

OO 



o 
OO 

CD 



o 

CD 
CD 



o 



CM 
CD 



CD 

o 



oo 
oo 

CO 



o 
oo 



CO 



o 

o 

< E 



oo 
o 

CO 



CO 

in 

CD 
CO 



in 



oo 

iri 
in 



CO 

di 
in 



in 

CJ> 

in 



o 
od 
in 



CD 



in 

CO 
CT> 

to 



9 

UJ z 
CO 



o 
n 



CSJ 



00 



CSI 

ro 



CO 



ro 



CO 
CM 

in 



o 
oo 
ro 



Oi 
CO 

od 
in 



ro 
CO 

ro 



CM 

in 



oo 
ro 



XT 

a> 
CO 

O 

< 



6 

H 



o 

I 

o 

H 
H 
H 

a 



Eh 

U 

u 

CD 
H 

% 



8 



i 

H 
Eh 
O 

U 

o 

Eh 

o 

Eh 
Eh 



% 

Eh 
H 
U 

o 
o 



I 



O 
U 

Eh 
Eh 

U 



C5 
Eh 



Eh 

O 
O 
H 
CJ 
C5 
H 

H 
Eh 



0.2 

Q. CO 



CD 
CM 



CM 



oo 

T — 

CM 



CO 

CM 



in 

CM 
CM 



CO 
CO 
CM 



OO 
CO 
CM 



CD 
CO 
CM 



CM 



CM 



O 

o 
Q 



ay 
o 



0> 



CO 



> 2 Q 
X g 



O ffl o ^ 

5<5 



CO 0> 

o ^ 



o 

_l (U 
O w 



- o 
CO 



S o o 

O C o 

^£ lo 



o 

o 



€0 



0) 

u 
c 
a> 

3 

a> 

CO 
< 



s = 

O. CO 



CN 



CM 



CD 

csi 

O 
OO 



GO 
CO 

co 



CO 

in 



00 

o 

CD 



CO 
CM 

o 



ID 

cvi 



CO 

co 
Csi 



CM 

o 



CM 

o 



o 

CO 
C3 



o 

GO 
CM 



o 

CM 
CM 



CM 
O 



o 

CM 
CD 



CM 
GO 

lO 
CO 



o 



I 

o 
o 

H 

U 
U 

6 

U 

H 
O 



O 
O 

U 

u 
u 

o 

H 



i 



6 

CD 
O 

u 
u 



J 
CJ 

u 



CO 
CO 
CM 



in 

CO 
CM 



CO 
CM 



(D 

o 
o 
Q 
>* 

0) 

c 



CO 

ay 
o 



50 



CO 



ill" 

< o 



> © Q 
X g 

o 



» 

CO a> 

a 



o 

-I <u 
O C 

I- 



E o 

- o 

CO 



0> ^ 

3*5 o 



® 



oo 

UJ z 

CO 



0) 

o 
c 

0) 
3 

cr 
o> 

CO 

o> 

O 
&_ 

CL 
< 



oo 



o o 

<N CM 



CM 

0> CO 

CO CO 

CO CO 



o « 
CL (A 



CO 



CD CTV O 
CM 



CO C3> 

CO CO 



cb o 



CM 



o o 

<5 



C3 



co 
o 



CO 



o 

CD 



o 
o 

oo 

CM 



o 



oo 

CM 



CO 

o 

CD 



o 



oo 
o 



CM 



O) 

CO 



CO 
CM 



o 
CO 



o 



o 

CO 



o 

CM 

a> 

CO 



C5 



CD 



ro CO 



CO 



u 
o 
a 

i 

i 



H 

J 
u 
u 

i 

J 

i 



a 
J 

H 

o 



(J 
o 
u 

i 

o 



J 

8 



CM 



CO 
CD 
CM 



0) 

o 
o 
Q 

E 
o 

< 



CD 

o 



a. 
0£ CO 

> 2 Q 
X 5 
o 



r 



5< o 



O }r 



o 

o c 



Eg 

- u 
CO 



3 o O 

O C o 

<i2® 



< 

< E 



ao 

LU Z 
CO 



cr 
o> 
CO 
o 

o 
a. 
< 



Q. w 
« I? 



O 

a> 

CM 
CO 



co 

CO 
CO 



CD 



CO 

CO 
CO 

CM 



ai 

CM 



CO 

in 



CO 
CO 



CO 
CO 



CO 

ai 



o 
o 
Q 
>H 

0) 

c 
o 



00 

T— 

o 



o4» 

CI 



CI 



c^ 

o 



co 



CM 
CO 
O 



CO 
CM 

CD 



o 
o 



o 



CM 
CD 



CM 
CO 
O 



CD 
CO 
GO 



O 
O 



o 

CM 



O 
CM 



CO 
CM 



CM 
C) 



CM 
CO 

CO 
CO 



CO 

o 

CO 



in 

CO 
CO 



CO 

o 
to 
in 



o 

CO 

in 



CO 
CM 

iri 
in 



CM 
CD 

m 



oi 



in 



Oi 

in 
in 
tn 



o 

LO 



CM 



crv 
m 



o 



CM 



IT) 



O 
H 

U 
E-t 

12 

U 
CJ 
CJ 



H 
O 

CJ 
CD 

8 

U 



C5 

a 
u 

s 



o 

CD 
U 

Eh 



CJ> 
O 
CO 



CD 



Oi 



CO 



CO 



CO 



CM 
CM 

CO 



CO 
CM 
CO 



CD 

o 



DC "E « 

o-o IS 

> g Q 
O 



o O) o 

-go. 

^ < o 



09 O 
O ^ 



CO 



2 
o 
o 



a 

-J O) 

o u. 

< 



Eg 

- o 
CO 



3 o o 

CO 



< _ 

I? 



9 

Ui z 
CO 



0) 

o 
c 

09 
3 
C 

d> 
CO 

o 
o 

0. 

< 

Q 



0.2 

Q. (0 



CD 



CO 
CO 

in 



m 

CO 

oo 

CO 



o 
o 

ID 



CO 



<D 
ID 



<D 



o 

GO 

o 



CO 
GO 
CD 

o 



oo 

O 



o 

CM 



o 

CM 



O 
CM 



O 

CM 
CD 



ID 

in 
in 



CO 



CM 



6 

U 



H 
H 

H 
CP 



i 

U 



H 
U 

I 

O 
H 
O 
U 
H 
E-t 



CO 



oo 

CO 



o 

ID 
CO 



o 
o 
Q 

a> 



CD 

xr 

1^ 

o 



1B0 



-92 

CO 



> O Q 

a 



< o 



.■2 

v> a> 

O »r 

o 



e o 
- u 

CO 



So o 
S E 



CO 



< _ 



CO 



o 

CO 
O 

2 
o. 

< 

z: 
a 



Q- (0 

« I? 

Q,CL 



00 



CO 

ai 



od 



co 



5 



CM 
C\J 



O 

CO 



CO 



CO 
03 



CM 
CM 
CO 



o 



00 
CM 



00 



CM 
CO 
O 



OO 
lO 
C3) 

Ci 



f:2 



OO 
O 



CO 
CO 

1^ 



in 
in 



CO 
CO 
CM 
CD 



CO 
CO 
CM 



OO 

o 



o 



CM 
CM 



CO 



OO 
CM 
O 



CO 
CO 



o 
Oi 
o 

cvi 



o 

CD 



CO 
lO 



CM 



to 
cn 

in 



in 



o 
in 

a> 



oo 



CM 
OO 

CO 



o 

CM 



in 



in 



in 

CO 
CO 

in 



CM 

o 
in 



LD 

o 

LD 



CTi 
O 



O 
LO 



CM 

rH 

IT) 



m 
in 



m 

LD 



T-t 

to 



CO 
LD 



O 
CM 
ID 



i 



1 

t1 



O 

o 
u 

g 

H 
U 
Eh 

o 



i 

u 



O 
CO 
CO 



CO 

CO 
CO 



CO 
CO 

CO 



CM 

CO 



in 
CO 



o 
o 
Q 

>* 
a> 
c 

u. 
O 



CO 

o 



o 



o 

CO 



co 



CO 
CD 

o 



CD 



fc .s- 

> g Q 
O 



4> 



<1> 



CO 

o 



i O 



- o 
CO 



So o 

O C o 



< _ 

< E 



UJ ^ 

CO 



4) 

o 
c 
d> 

3 
CO 

s 

Q. 
< 



0.2 

O. (0 



CD 



CO 

o 

00 



oo 
od 



CO 

CO 



CD 

oo 

CM 



CM 



CM 



CM 

co 



o 



1^ 



CO 

CO 
CO 
CO 



CO 
CO 

o 



If) 
oo 

ID 
CO 



CD 
00 



CO 

o 



CO 

<7> 



CM 



oo 
oo 

CD 
CD 



CO 
CO 



CO 

CO 
C) 



CO 



CM 



CM 



CO 



CD 



CM 



CN 



CM 



o 

lO 



IT) 



CO 



o 

CM 



oo 

iO 
CN 

CD 



CO 
CO 
CO 



CO 



CM 
lO 
CD 



O 



O 



O 

T— 

C) 



CO 

to 

CO 



oo 

CO 
CD 



CO 



CO 



CO 



ai 

CO 



CO 

cvi 

CO 



CO 



oo 

CD 
CD 



CM 
CO 

to 



in 



CO 
CD 



cn 

CM 

in 



eg 

fO 

in 



in 
n 
in 



m 
in 



in 



i 

o 
o 

5 



1 



s 



O 

o 

g 

o 
o 



CJ 

I 

u 
< 



H 

u 

O 
H 
U 

i 
g 



g 



i 



CJ 



O 



Eh 

§ 

H 
H 

o 

s 

CJ 



o 

H 
Eh 
Eh 
H 
H 

i 

CJ 
Eh 

s 



Eh 
U 
H 

Eh 

i 

Eh 



CM 
OO 
CO 



to 
oo 

CO 



CO 
CO 



LO 

ay 

CO 



CO 

ay 

CO 



CN 

o 



o 



CD 

o 
o 
Q 



CD 



CD 

o 



n_ r: 



■ T3 

5 « 
O « O 

= 5 P 
^ < o 



o 



to 

€0 



CO 



£ 
o 
o 



Q 

-1 m 

o S: 



CO 



S o o 

O C o 

O CO *^ 



< _ 



cro 

CO 



o 
c 
o 

3 

0) 
CO 
0) 

2 

Ql 

< 



0.2 

O. (0 
0.0- 



CD 
CO 



CD 
CD 



oo 

CN 



o 
in 

CM 



ID 

tn 

CM 



CNJ 

o 
cn 



CO 
CD 

oo 



CO 

in 

Ovi 
CO 



CO 



a I 



CI 



c! 



o 
oo 
o 



m 
o 
o 



in 

CNJ 



oo 
CO 



oo 
oo 



CM 

oo 



CO 
CM 



in 



m 



CO 



o 
S 



ay 

CO 
CO 



CO 

oo 

CM 



CO 
CD 

in 



o 
o 



o 
o 



o 

CO 



in 



in 

o 
in 



in 
oo 
o 

CD 



CD 
CD 
CM 
CD 



CO 
CO 



CO 



CO 
CO 

csi 



CD 
CD 
CM 



CM 

o 



in 

lO 



in 

LO 



CM 

tn 



IT) 



tn 



in 
in 



E-" 

Eh 
Eh 

H 

u 

Eh 

Eh 
£h 

s 



a 



CD 

8 

CD 

i 

Eh 
Eh 
Eh 
Eh 

a 

H 



CD 

8 



CD 

8 

O 
U 

o 

i 

Eh 



CD 

CD 
U 
CD 
Eh 
U 
U 

i 



c;> 
o 



CO 



CD 



O 
CM 



CO 
CM 



in 

CM 



o 

CO 



CO 



0) 

o 
o 
Q 
>^ 

(D 

c 
o 

< 



i 



CO 

o 



£ i2 
> « Q 



€0 a> 



-22 

CO 



(0 0) 
5* O Jr 

Q. O 



> gco 

< o 



Q 

-J Q> 
O W 

< 



p2 

CO 



So o 

O C o 
n 75 « 



o 

o 



CD CO 



CO 
UJ z 
CO 



o 
c 
o 

3 
O* 

o 

CO 

2 
a. 

< 



li 

Q. (0 

*? 



10 CD 

a> CO 

00 CM 



CO 



o 
o 



CO h- 

CO CM 

06 
CO 10 



CD 



CD 

in 



CO 



CM 



ID 
CO 



CD 



CM 
C3> 



CO 



CD 
O 
CO 



00 

CO 
CD 



CD 
CO 
O 



O 
CM 



00 
CO 
C3) 



00 
CO 

c:> 



00 
CO 



CM 
CO 
C3) 

CD 



in 
o 

CM 
C=> 



CO 



CO 
CD 



CO 
00 



co 



o 

CD 



o 
csi 



o 

O) 



CM 



00 

in 
in 



CM 



CD 



CO 



CO 



CO 

s 



CO 
CO 



tn 



I 
I 

u 

CJ 
H 
E-i 

o 

H 

o 



i 



a 

CJ 

a 
o 

Eh 



u 
u 

Eh 
H 
O 

i 



CM 
CO 



in 



CO 



CD 

CO 
CO 



H 
U 
Eh 

1 

u 

Eh 

i 



H 
H 
CJ 

I 

U 

u 

H 
U 

S 



O 
Eh 

U 
Eh 

I 

^1 



in 
r- 

cd 

CD 



CNJ 

o 



O 

in 



CO 
in 



CO 

in 



00 
in 



o 
o 
Q 
>^ 

C 



• 



CO 



0) 
CO 




CO 

o 



o 

"55 

u 
o 
o 

>^ 

<D 



4 



> a> Q 

o 



_ o 
• ^ *-» 

5 S» « <» 

° m 

= © p ja 

< o 



CO 

O »r 

o 



20O 



-Q 
CO 



o 

O C 

< 



^2 

iJ o 
I- o 

CO 



So o 

O C o 

^5 to 
O CO 



< _ 

15 



ao 

Ui z 
CO 



0> 

o 
c 
a> 

3 
O" 
O 
CO 
a> 

£t 
O 

Q. 

< 



■2 o 

0. CO 

•S.2 



CD 
CD 

CO 



CM 

in 



CJ) 

o 



c\i 



CD 
CJ3 



CO 
CD 
C3> 



C35 

CD 
CO 
CO 



in 

CO 

csi 
in 



C3> 
CO 

in 



in 

o 
o 
oo 



in 
in 



CO 
C35 

o 



CM 



00 

1^ 



CI 
Ci > 



CI 



CO 

o 

(35 



CD 



O 
CD 
O 



CM 
O 
CM 
O 



CD 

o 



CO 

d 



CM 
CD 

in 



CM 

o 



CO 
CM 



CO 
CNJ 
CO 



CM 



CM 



CO 
GO 

in 



CZ) 



CD 



CM 
CM 
CD 



CD 



OO 

in 



CO 
CM 

m 



CD 

CO 
in 



CD 

in 

CD 

o 



CD 



CZ) 



oo 

CM 
CM 



o 

CD 
C5 



O 

in 
o 



o 

CD 



o 

CD 
CD 



O 
CD 
CD 



in 
in 



CO 
CJ> 



in 

CM 

to 



in 

CM 

in 



CO 
C35 

in 



<3> 
CD 

CO 

in 



m 

in 
in 



CD 
CD 

in 



CM 



CO 



in 
m 



O 

in 



CO 
CD 



CO 
CD 



lo 



LD 



CM 



CD 



LD 



a 
o 

H 

i 

t1 



o 

Eh 

u 

Eh 
Eh 



a 
u 

Eh 
Eh 

i 

u 



% 

£h 

u 

i 

CJ 



CO 



O 
O 

m 



H 

O 
Eh 

H 
H 

H 
CJ 

CM 
O 

in 



u 

H 
U 

i: 

Eh 

I 

O 
CO 

o 
m 



in 
o 
m 



u 

Eh 
U 
H 
H 
Eh 

o 
in 



U 
H 

U 

I 



C35 
O 

in 



CJ 



in 



CO 

o 



0) 

o 
o 
Q 



i 



CD 
O 



-= 2 
> © Q 

a 



5^ 
I 2 

C <D 

5^ 



0) 0) 

O 



CO 



O 
O 
O 

CO 

to 



CO 
CO 
CO 



CO 

in 



CJ> 
CO 

o 



CO 

in 



CO 

o 



co 



m 

CO 



in 

cvi 
o 



co 

CO 



o 



o 

Cf > 



!5 



C3> 



o 



CNJ 



o 

CO 
CD 



CO 
CO 
CO 



in 

CO 
CO 



CNJ 

co 

C3 



CO 

in 
c> 



CO 
eg 

CO 



CO 

GO 



Q 

-I <1) 
O il 



CD 



CO 
CO 

in 



CO 
CD 
CO 



CNi 



CD 



in 

CM 

o 
o 



in 
o 



CM 
CO 



CO 

in 



Eg 

- o 
CO 



CD 
O 



CO 

czi 



in 



C3> 
CO 

o 



CO 

o 

CO 



CD 
CM 



CO 
CO 



3 O O 

a CO *^ 



o 

CD 
C) 



O 
CO 

CI) 



o 

CO 
CD 



O 

CD 



O 
CD 
CD 



O 
CM 



< ^ 

< e 



GO 

CD 
CO 



CO 

o 
in 



CO 
CM 

in 



o 

CJ> 

in 



o 

CT) 

m 



CM 
C31 

Csi 
CO 



in 
o 

CD 
CD 



O) 
C3) 

in 



in 



in 



CO 

o 
CO 
in 



CM* 

in 



o 

C3) 



GO 

in 



CO 
in 



CD 

in 



CO 

iri 
m 



LU Z 

CO 



IT) 



in 



CO 



ID 



00 



eg 



00 



CO 
OO 



O 

c 

0> 
3 
CT 
0> 
CO 
0> 

o 

Q. 
< 



Eh 

H 

E-t 
O 
Eh 

U 



Eh 
CJ 

Eh 
Eh 
Eh 

Eh 
O 
H 



1 

H 
U 

Eh 

i: 

Eh 
Eh 
U 
Eh 
O 
Eh 



u 
o 
o 

H 
H 
O 

Eh 

Eh 
Eh 



in 



CO 



o .2 



m 



in 



m 



o 

CM 

in 



CM 
CM 

in 



CO 
CM 

in 



CO 
CO 

in 



in 

CO 

in 



GO 
CO 

in 



CO 
in 



o 
o 
Q 

CD 



CD 
•t — 

o 



a. 
QC 2 IB 
O-O to 
> « Q 

O 



<5 o 
■o 2 Q. 

C m C 

< o 



(0 o> 
O 



05 



CO 



0> 

o 
u 



a 

O C 



- o 

CO 



o c « 



< e 



oo 

UJ z 
CO 



u 
c 
a> 

3 
O" 

a> 

CO 

o 
2 
< 



CL CO 

Q.Q- 



CD 



ID 
CNJ 



CO 



CD 



CNJ 



ID 



ID 
CM 



CO 
ID 



CO 
CO 



OO 
O 
CO 



0) 

o 
o 
Q 

>• 
a> 
c 



CI 



o 

CM 
CM 



O 
OO 
00 



OO 
CO 



CO 

r- 
co 



CO 

in 



CD 



OO 



OO 



CD 



CM 
CD 
O 



O 
CO 
CD 



CM 
CD 
O 



OO 

o 



CM 
CD 
O 

CD 



CM 
CD 



CM 
CO 
O 



CD 
O 

C> 



CD 
OO 
O 



CO 
CD 



CD 
CM 



CD 



CD 
CD 



CO 

CO 



o 

CD 



O 
O 

CM 



O 
CO 



O 



O 
O 
CD 



O 
CM 



O 
CO 



O 
CM 



O 
CM 



O 

CO 

in 



o 

CD 

in 



CM 

in 

CJ> 

in 



OO 

CD 
CD 



cn 



CD 



CM 

in 

CO 
CO 



o 



in 
o> 
od 

CD 



O 
CO 



CO 

in 

CO 



CM 

in 

CD 



in 

CO 

c\i 

CD 



OO 



CM 
U5 



CTi 



I 

O 

i 

s 



H 
U 
H 

i 



ro 
o 



O 



O 
o 

u 

CD 



Eh 

H 
H 

o 
a 
u 
o 
o 



u 



o 



Eh 
E-i 
E-t 
E-t 

U 



E-t 
E-t 



ID 



in 
in 



OO 

s 



cn 



CO 

in 
in 



in 
in 
in 



m 
m 



CM 
CD 

in 



in 

CD 

m 



CO 
CD 

in 



4 



CD 

cn 
o 



a. 

> g Q 



(0 o 

o 



CO 



*- — o 

< o 



o 

o S 

< 



^2 
e o 
- o 

CO 



So o 

op" 



u 

o 

E 



oo 

UJ z 
CO 



u 
c 
o 

3 
O" 

d> 
CO 
a> 

2 

< 



li 

QL CO 

•as 



o o 



C7> 



oo 



05 05 

CO CO 

o o 

CD CD 



o 
o 

T— 
T — 

eg 



o 
CO 
oo 

CM 



CM 



o 

CO 
CO 

o 

CM 
CM 



O 

in 

CM 



CO 
CO 

o 



CM 



CO 



a> 

CO 



CM 



CO 
CD 



OO 

in 



CM 

oo 

CM 



CM 
OO 



oo 

CO 
CM 



CO 
CM 



CO 
CM 



in 



CO 
CD 

in 
c> 



CO 
CO 
CO 

CO 



CD 
CO 
CM 



CM 
CO 

o 

CD 



in 
in 

CD 



o 

CO 



o 



o 
o 



oo 
o 



o 

CM 
OO 



o 
o 
iri 



o 
o 



CO 

in 



CO 



o 

CO 
CD 



CD 



CJ> 
CO 

CD 



CD 

a> 

CD 



CD 

ai 

CD 



1^ 

CM 



GO 
CO 
CD 



o 
m 



u 

Eh 

H 
U 
U 

U 
U 

u 

H. 

i 

in 
oo 
in 



O 
U 
E-« 

U 
U 
U 

CO 
GO 

in 



CO 
CO 



u 

H 
Eh 

t1 



o 

CM 



o 

CO 

CD 
CO 



CO 



Eh 

Eh 
Eh 
H 
U 
U 
Eh 
H 
CJ 
O 

a 

Eh 
H 
OO 
00 

in 



Eh 
U 

Eh 
Eh 

u 

Eh 
H 
H 

U 
U 
Eh 
Eh 
U 
U 
U 
Eh 

O 

cn 
in 



Eh 
CD 

Eh 

g 

O 
H 

Eh 
Eh 

O 
CJ 

H 

Eh 
O 
CJ 

CM 

o> 
m 



Eh 
Eh 



U 
Eh 
Eh 
Eh 

CJ 
U 

H 
U 

CO 

a> 
in 



CD 

o 
o 
Q 
>» 
d 



4 



CO 

o 



Q. 

^ jE « 

o « 
> « Q 
X § 
O 



, o> o - 
< o 



CO 0) 
o *^ 



CD 



CO 



CO 



o 

—I m 
O C 

< 



0) 

e o 
- o 

CO 



3 o O 

O C o 



< ^ 



ao 

UJ z 

CO 



o 
u 
c 
a> 

3 

a> 
CO 
o 

2 
a. 

< 



0.2 

Q. CO 
Q.CL 



CM 



O O 



in 



CN 
CO 

in 



tn 
CO 

CD 



CO 



CM 
CO 



CO 
CO 
CO 



in 

CM 

to 
in 



QO 



in 
ci 



CI 

o 



ci 



CM 
CD 



CO 
o 
oo 



CM 
CO 



in 
in 



CM 
CD 

o 



CM 
CO 

o 



in 



C3> 
CD 



CM 
CM 



O 
CO 
C) 



O 
CM 



CO 
CO 
CD 

in 



in 

CD 
CD 

tn 



o 

CM 

m 



LT) 



IT) 



CM 
CM 

in 



CM 



Eh 
t- 
U 

U 



I 

U 
H 
Eh 



CM 
CD 
O 
CD 



in 
in 

CM 



O 
CM 



CM 
Csi 

in 



in 



CO 



CO 



CM 
5 



CD 



CO 



CD 



CO 



CO 



00 
5 



O 
CM 
CO 



a> 

o 
o 
Q 

0) 

c 



4 



CO 

CD 
O 



o. 

> g Q 

O 



(A 0> 



< o 



(0 o 
O 



CO 



CO 



o k. 

< 



E o 
- O 
CO 



So O 

O p o 

rn CB «^ 



O 

o 

E 



oo 

UJ z 

CO 



«> 
o 
c 

Q) 
3 

0) 
CO 

a> 

2 
a. 

< 



CO 



a. (0 



CO CO 
CM op 



CO 



CO CD 

oo CO 

in CO 
oi 



CO 
CM 



CO 

oo 



CO 

in 



in 

CO 
CO 



1^ 



CO 

oo 
o 

CO 
CO 



oo 
oi 
CO 

CO 



CM 
CO 

CO 



CM 
C) 



CO 
o 



CO 



o 

CM 

o 



CO 

o 

CD 



CM 
CO 

m 



CO 

o 



CO 

o 

CO 



CO 
CD 



in 



CO 



CO 
CM 



CO 



CO 
CM 



CO 
CM 



CD 



CM 



1^ 
CD 



oo 



oo 

CO 



in 
in 
o 

CD 



CM 



o 

CM 



o 



o 

CO 



o 

CO 



o 

CO 



oo 
o 

m 



CO 
CO 

m 



CO 



CO 
CM 

iri 

CO 



CO 

in 

CO 



o> 

CO 
CO 



CO 
CO 

in 

CO 



CM 
CO 

CO 
CO 



Oi 

o> 

Oi 

in 



o 
oo 



CM 

oo 



tn 
oo 



oo 



OO 



CO 

Oi 

CO 

m 



CTi 



O 
H 
U 
H 
H 
U 

H 
E- 

H 



U 
O 

i 

Eh 
O 
E-t 
U 
H 

O 



8 



CD 
Eh 

U 
O 

u 



H 
E-« 
O 



a 

Eh 

u 

Eh 
Eh 



I 

Eh 

g 
g 

u 
o 

CD 



CM 
CO 
CO 



in 

CO 
CO 



Oi 

CO 
CD 



CO 



CM 



CO 



o 
o 
Q 

0) 

c 



CO 

o 



CD 



o IS 

> a> Q 



(0 o 



5 « 

^ m O 

"D Q- O 

.£ « E 



(0 0> 

as 



o 

-J <D 
O il 

< 



- O 
CO 



o . 

3 O O 
O C o 



CQ 



o 

o 

E 



9 

LU Z 
CO 



or 

CO 
0) 

o 

Gl 
< 



S c 
0.2 

a. (0 

« I? 



CD 




CO 


lO 


f — 




CO 


oo 




• 






oo 


CM 








CN 


CD 


Q 




Q 




C3 








LO 






CVJ 


m 


CO 


O) 


CD 




OO 




CM 




o 


oo 


O 


CD 


ay 


O 


CD 




CO 


CO 


CO 


CN 


LO 


CD 




CO 


CN 


o 


CO 


o 


CD 


o> 






CO 


CO 


CO 


in 


o 


CO 




CO 


LO 


00 






OO 


O 


CM 
















XT 


CM 


CN 






LO 


CD 




ay 


CO 


CO 


^D 






m 




m 


oo 




CD 


ay 


ay 






























— 


CM 




CO 


OO 




CO 




CD 


o 


oo 


CD 


CO 
















































CM 








i 


3 


i 


i 


? 


o 


i 








i 

o 




C3 


% 


i 

cJ 


s 

Cp 


cJ 


i 


s 

o 


B 


i 


i 

cj 


3 

C3 


3 

o 


3 

o 


Cp 


i 


00 


in 


OO 


CM 


o 


OO 


o 




o> 


C7> 


CO 


ay 


OO 


ay 


o> 


X — 


OO 


OO 


CO 


CD 




o 


CM 




oo 


in 


CD 


in 




in 


in 


cn 


tn 


CJ> 


CO 


oo 


oo 


oo 


CO 


CM 


CO 


CO 




CO 


in 




CO 


oo 


CJ> 






CD 




O 


ir> 


o 








o 




CO 


CO 


CO 




CO 




o 


o 




CD 




m 




in 




LO 


CO 




CM 




o 


CD 
* 


CD 


o 


CD 


CD 


CD 


CD 


CD 


CD 


CD 


CD 


CD 


CD 

' 


CD 
' 


CD 


CD 


CD 


CD 


CD 


CD 


CD 


CD 


CD 


CD 


CD 


CD 




o 


O 


o 




oy 




O 


O 


O 


O 


CO 




CD 


CD 


CD 


CO 


CO 


CD 


CD 




CO 


CO 


CO 


OO 


CO 


CO 


05 


















r*— 




CO 


CN 


LO 


LO 


in 


LO 


LO 


LO 


LO 




LO 


lO 


LO 


CD 






CO 


CD 


CD 


CD 








CO 


co 


CO 


CO 


CN 


















in 




■s — 




OO 


CN 


CM 


O 


<b 


CD 


O 


CD 


CD 


CD 


CD 


CD 


CD 


CD 


CD 


CD 
















CN 

1 


CO 
1 


CO 
1 


CO 
1 


cvi 






CM 


CO 


CD 


m 


CO 


CO 


CO 


CO 


CO 


CO 


CO 




CD 


OO 


OO 


oo 


OO 


o 


CM 


in 




LO 


ay 




o 


CD 


LO 


m 


oo 


in 


CD 




X— 




T— 


T— 






CO 




ay 


ay 


OO 


CO 


CO 


CO 




o 


CM 


o 


C3> 


S; 


O 


OO 


CO 








CM 


CM 


CM 


CN 


CM 


CN 


CO 


o> 


CO 


m 


m 


CO 




o> 


CO 


CD 


LO 




OO 




o 




LO 


o 


CD 

1 


CD 
1 


CD 
1 


CD 


CD 


o 


CD 


CD 


CD 


CD 


CD 


CD 


o 


CD 


CD 






X— 


X — 


CN 


CN 


Cxi 


CN 


CN 


^ 


CD 


o 


O 


o 


o 


O 


O 


O 


O 


O 


O 


o 


O 


o 


o 


O 


o 


o 


o 


o 


o 


O 


o 


o 


O 


O 


O 


O 


o> 




CD 




CO 




CO 


Oi 


CD 


oy 


CD 






CN 


CN 


CN 


CM 


CN 


CM 


CM 


CO 


in 


in 


in 




CO 


CO 


o 


^D 


^D 




• 






^D 


_* 


CD 






cp 
















• 


rr\ 
1 








* , 


*l 


h- 
















^ 




T — 


CO 


ay 






CN 




CO 


ay 


CM 


CO 








OO 


CM 


CO 




O 


CO 


CM 


OO 


OO 


OO 


OO 


OO 


oo 


CJ) 












CM 


in 




CD 




ay 


in 


CO 


CM 


ay 


CO 






O) 


oi 




T~ 










CM 


CO 








CO 


CM 


CO 




CO 




o6 


CD 


1^ 










m 


in 


LO 


CD 


CD 


CD 


CD 


CD 


CO 


CO 


CO 


CO 


CD 


CD 


CD 






[ — 








1 — 


1 — 




CO 


CD 


CM 


ro 




in 


vo 


r- 


oo 




o 




CM 


CO 




LO 






CO 




o 


iH 


CM 


m 




LO 






00 


<j\ 






cn 








CTk 


o 


o 


O 


o 


O 


o 


o 


o 


o 


O 


rH 


i-l 




rH 


1-1 


rH 


1-1 


rH 


1— 1 


r- 


r- 














00 


oo 


OO 


oo 


OO 


oo 


OO 


oo 


oo 


OO 


OO 


OO 


oo 


OO 


oo 


OO 


00 


00 


oo 


H 


E-t 


H 








u 


o 






U 


a 


6 






u 


CJ 


Eh 






u 


H 




Eh 








H 


Eh 








8 


O 


u 




g 




u 




g 




Eh 


u 


u 




8 


o 


u 


Eh 


Eh 




t 










Eh 




O 


O 


o 


u 




g 








g 




Eh 


o 


CJ 


Eh 


o 


O 


o 


Eh 


Eh 




i 






E-t 




£h 










o 








a 








H 


CJ 


CJ 


H 


O 


o 


O 


Eh 










U 






H 




8 


8 


o 


u 




g 


Eh 




S 


g 


H 




CJ 


U 


Eh 


o 


CD 


U 


Eh 


Eh 




1 






Eh 






Eh 




CD 


u 


CJ 




6 




u 










U 


O 


£h 




CD 


O 


Eh 


H 












Eh 


Eh 


i2 






o 


u 




g 




CJ 




g 




Eh 


u 


CJ 


g 






U 












u 


Eh 




H 




8 


o 


o 


U 




g 








g 






u 






8 


CD 




Eh 


H 








U 




Eh 




H 




o 


o 


u 




g 




a 




g 




H 


U 


O 


Eh 




g 






Eh 








(J 


H 




Eh 


12 






u 




g 








g 






U 


U 






e 


O 




Eh 




i 




U 






E- 




8 


8 


CJ 


CJ 




g 




u 




g 




Eh 


a 


U 






H 




H 


Eh 




1 








Eh 


Eh 








u 




g 




O 




g 




Eh 


u 




g 




8 


O 


Eh 










U 


£h 


H 






8 


8 


CD 


o 




g 


Eh 


u 




g 




Eh 


U 








H 




H 


H 








a 


Eh 


H 


£h 


Eh 








u 








U 




g 






u 




g 








=H 


Eh 








CJ 




^' 


Eh 








o 


u 




g 


Eh 


CJ 




g 


TG 


o 






§ 










Eh 








U 


Eh 


Eh 








o 


o 


U 




6 




u 




12 


o 


u 












H 










O 


Eh 




Eh 


H 




o 




o 




g 




o 




o 


U 






6 






O 








i 




U 




Eh 


Eh 




o 


8 


CD 


O 




g 


H 








O 




a 








H 




Eh 


Eh 




i 






£h 


£h 




Eh 




CD 


O 


u 












Eh 


§ 


o 


U 






Eh 




O 












a 


H 


H 


H 




CD 


CD 


CD 


u 




oo 


CD 


o 




CM 


CO 




tn 


CO 




oo 


ay 


O 


T 


CM 


CO 




in 


CO 




OO 


CD 


O 




CSJ 


CO 








in 


In 


in 


in 


m 


in 


in 


in 


in 


m 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CD 












CD 


CD 


CO 


CD 


CD 


CD 


CD 


CO 


CD 


CD 


CO 


CD 


CO 


CO 


CD 


CO 


CO 


CO 


CO 


CD 


CO 


CD 


co 


CO 


co 


co 


CO 



CD 
O 

o 
Q 
>» 

(D 



i 



CD 

o 



X g 



<3 « O 



O 
O 



CO 



CO 0) 

as 



CO 



a 

o 

O w 

< 



^2 

E o 
- O 
CO 



So O 

O C o 

|£ m 



< 

z 

a: 



o 

o 

E 



LU Z 
CO 



CO oo 

LO GO 

CO CO 

<D CD 



o 
u 
c 
a> 

3 
O" 

O 
CO 

Q> 
Si 

o 

w 
< 



CM O 
T- ID 



O O 
ir> ID 



C\J CM 
oo 00 



CD 



O O 



<D OO 

CO O 

CM OO 

<=> o 



CD CM 

CD CD 

CM CO 

ID ID 



CM CM 
OO OO 

c> 



o 

CM 



ID 
CD 



CO 
ID 



OO 
ID 



00 
ID 



ID 

o 



o 
oo 

CM 



CD 
CD 
O 



CD 
OO 

to 
d 



CM 
OO 

cn 



oo 
o 

00 
CD 



o 

CD 

ci 



CD 

ay 



CD 



CD 
CM 



OO 



CD 



O 
O 



O 

o 



o 
oo 



o 



o 



CD 



CD 

ai 

ID 



ID 
CM 

cii 



o 

CO 
CO 
ID 



OO 
ID 



CD 



to 

ID 
CO 
CO 



OO 
O 

od 

CO 



CO 
CO 



cn 

CO 



U 
O 



f 

O 
< 

o 

u 
< 



o 

I 



o 



< 

H 
O 
O 
CD 

1 

CJ 
O 
O 



CM 
CJ> 
CD 



in 

CD 



CD 
CO 



CD 



o 



o 
o 
Q 

(U 

E 
o 



if # 



HIV PRT 
GeneChip™ 
Data 


49.3 


55.0 


49.0 


457 


115.6 


50.6 


48.0 


50.5 


44.1 


43.1 


45.2 


47.4 


50.0 


47.8 


50.2 


43.0 


57.0 


58.7 


183.6 


303.4 


135.7 


241.7 


132.6 


128.8 


141.0 


282.0 


948.6 


Window- 
Averaged 
Composite 
Score 
























695 0 




-0.867 


-1.022 


-1.096\ 


-1.088, 


■1.072 


S 


'0.924 


-0.837 


•0.766 


-0.737 


-0.758 


q 
c 
r- 

c 


» 
I 

i 






Composite 
Score 


0.096 


0.147 


0.091 


0.278 


0.560 


0.167 


0.202 


-0.124 


-0,242 


-0.316| 


-0.442| 


-0.581! 


-0.581 


-0.739 


-1.080 


-1.275 


-1.371 


-1.524 


-1.102 


-0.524 


-0,628 


-0.653 


-0.669 


-0.763 


-1.025 


-0.897 


-0.671 


AGmfold 
Score 


0.496 


0.496 


0,496 


0.670 


0.844 


0.236 


1.105 


1.105 


1.453 


1.540 


0.757 


0.323 


0.323 


0.323 


0.323 


0.323 


0.323 


0.323 


0.496 


1.366 


1.453 


1.279 


0.931 


0.931 


0.844 


0.844 


0.844 


H u 
CO 


-0.149 


-0.067 


-0.158 


0.037 


0.386 


0.125 


-0.352 


-0.877 


-1.281 


-1.454 


-1.178 


-1.135 


-1.135 


-1.390 


-1.940 


-2.254 


-2.408 


-2.655 


-2.081 


-1.682 


-1,903 


-1.838 


-1.649 


-1.801 


-2.171 


-1.965 


-1.599 


3 o o 


0.70 


0.70 


0.70 


0.90 


1.10 


0.40 


o 


o 


o 
oc> 


o 


o 
o 


0.50 


0.50 


0.50 


0.50 


0.50 


0.50 


0.50 


0.70 


o 


o 

00 


o 

CO 


O 
CNJ 


o 

CNJ 


o 


o 


o 


< 


59.35 


59.911 


59.29 


60.62 


63.00 


61.22 


57.97 


j 54.39 


51,64 


50.45 


52.34 


52.63 


52.63 


50.89 


47.14 


45.00 


43.95 


42.27 


46.18 


48.90 


47.39 


47.84 


49.12 


48.09 


45.57 


46.97 


49.46 


SEQ ID 
NO: 


CO 


r- 

00 


CO 
CO 


oo 


o 
in 
oo 


rH 

in 
oo 


CN 

tn 

oo 


ro 
in 
oo 


in 

oo 


in 
in 
oo 


VD 

in 
oo 


r- 
in 
oo 


oo 
in 
oo 


<y\ 
in 

oo 


o 

CO 


.-1 

00 


CM 
VO 
OO 


m 

00 


00 


tn 
oo 


CO 


00 


00 
OO 


CO 


O 

r- 

00 


00 


CN 

oo 


DNA Probe Sequence 


TACAGTCTCAATAGGGCTAA 


ACAGTCTCAATAGGGCTAAT 


CAGTCTCAATAGGGCTAATG 


AGTCTCAATAGGGCTAATGG 


GTCTCAATAGGGCTAATGGG 


TCTCAATAGGGCTAATGGGA 


CTCAATAGGGCTAATGGGAA 


TCAATAGGGCTAATGGGAAA 


CAATAGGGCTAATGGGAAAA 


AATAGGGCTAATGGGAAAAT 


ATAGGGCTAATGGGAAAATT 


TAGGGCTAATGGGAAAATTT 


AGGGCTAATGGGAAAATTTA 


GGGCTAATGGGAAAATTTAA 


GGCTAATGGGAAAATTTAAA 


GCTAATGGGAAAATTTAAAG 


CTAATGGGAAAATTTAAAGT 


TAATGGGAAAATTTAAAGTG 


AATGGGAAAATTTAAAGTGC 


ATGGGAAAATTTAAAGTGCA 


TGGGAAAATTTAAAGTGCAA 


q 

CD 

8 


GGAAAATTTAAAGTGCAACC 


GAAAATTTAAAGTGCAACCA 


AAAATTTAAAGTGCAACCAA 


lAAATTTAAAGTGCAACCAAT 


AATTTAAAGTGCAACCAATC 


p5 Probe 
Position 


(N 
O 


CO 

o 


o 


ID 
O 


to 
o 


o 


oo 
o 


<y> 
o 


o 




eg 
?^ 


CO 




?~ 


to 




CO 

r~ 


CJ) 


o 

CM 


CNJ 


CNi 
CNi 


CO 
CM 

h~ 


CNI 


LO 
CNJ 


CO 
CNJ 


eg 


CO 
CNI 



i 



CD 

o 



rr 



0) 

CO 




o 
o 
Q 
>» 

0) 

c 



4 



1^ 
O 



? -C S 
> g Q 

o 



C o (= o 

< o 



(0 o 
O 

o 



„1 



CQ 



CO 

CO 
oo 



o 



CO 
00 



CN 



00 

oi 



CO 



c\i 

GO 
CO 



CO 



CI 

at 



CI 



CI 



CI 



CI 
Ci 



CO 



CN 



CO 
CO 



CD 



OO 



CO 

o 



CO 

oo 
o 



o 

-J m 

o S: 



CM 
CD 
O 



CD 



oo 

CI> 



oo 

CD 



cr> 

CO 



to 

CO 



in 
oo 



E o 
- o 
CO 



CO 



CD 



CD 

o 
o 



o 



CO 

Cvj 



CO 

CO 



CM 
CO 
CO 

d 



X — 

CO 

o 



3 o o 
si in 

<3® 



o 

Csl 



O 
CO 



o 
o 



o 

CD 



O 
CO 



O 
CM 

CD 



O 
CD 
O 



O 



< 



O 

o 

E 



Oi 
CD 

to 



CO 

CD 
IT) 



C3> 
CM 

lO 



CD 
CO 

0C> 



o 
iri 



CD 



oo 
o 

ai 

CD 



o 



CD 

o 

CD 



OO 
CM 
CD 



CO 
ID 

csi 

CO 



oo 

UJ z 
CO 



o 
o 



CM 

o 



o 

CTl 



O 



CM 



IT) 



i-H 



o 

CM 



CM 
CM 

cn 



LO 
CM 



Csl 

cn 



0> 

u 
c 
o 

3 
O" 

o 
CO 

a> 
o 
a. 
< 



CD 
O 

I 

O 



o 

I 



O 

i 



U 

u 

CD 

s 

O 

< 
CD 



i 

o 
u 

i 

CD 
H 
CD 



2:1 

Q. CO 
«> -? 



CD 



CM 
CO 



CO 



CM 



CM 
OO 



0) 

o 
o 
Q 
>s 

<D 
C 



ft 



CO 

o 



^ JZ Z 
O « 

X § 



5 5L 



-go. 



■O _ 

> < o 



o 

o ^ 

<3 



CO 



(0 

1 " 



CO 



CO in 
o o 



0) 

- o 

CO 



So o 

o c « 



< ^ 

< 6 



oo 
tu z 

CO 



0) 
O 

c 

3 
O* 
0> 
CO 
o 

o 

w 
< 



o o 
in in 

o 



CM _ 

CO m 



CTi 



CO 



CO 



CO 



cj) 
o 

to 
O CD 



CO 



CO 



00 00 



CO 



CO 

cvi 

CO 
CM 



CM 



CO 



CI) 



CO 



CO 

00 



o 

CO 



CO 
CO 



CO 



ro 
in 



i 

H 

I 

U 

I 



CD 
O 
CO 



_^ 

O 

o 
Q 
>^ 

C 



CO 

hi 
o> 
o 



> © Q 

o 



o 2* o J? 
■o 2 Q. o 

c « e 
> < o 



O >r 



Q 

o S: 
s o 

< 



E O 
- O 
CO 



So O 

O C o 



i2® 



< ^ 



UJ z 
in 



o 
u 
c 
o 

3 
O" 

0) 
CO 

a> 
2 
< 



0.2 

I? 



CO 



oo 



CD 



CO 

CM 
lO 
CO 



CO 
CO 



CD 



1^ 
c\i 



CO 



CO 
CO 



CN 
CO 

o 

<3> 



CN 



o 



fi- 
at 



a » 



T" • 



oo 

S2 



CO 



CO 

1^ 



CD 
CM 

oo 
o 



CD 
CM 
CO 



OO 
O 

c> 



o 



O 



CO 

oo 



o 
co 

CD 



C3^ 



CO 

oo 
in 



CO 
CO 



oo 

CM 
CD 



CO 
CO 
CO 
CD 



CM 
O 



1^ 
O 



O 
CO 
CD 



O 
GO 



o 

CD 



O 
O 



O 
CO 



o 

CM 



O 

CO 



CM 



OO 
CD 
CO 
CO 



oo 

lO 



in 



lO 



CJ> 



lO 



o 

CM 
CD 
CO 



CO 
O 



CO 



m 
as 



VD 



CM 



cn 



o 
oo 



CD 

U 
H 

i 

U 



H 
O 
H 

CD 

a 



CD 



U 
E- 

i 
1 

% 



CO 



CO 



CO 

5 



CO 



CO 



C35 
CO 



CM 
CM 
CO 



CM 
CO 
CO 



CD 
CO 
OO 



0) 

o 
o 
Q 

a> 



if 



4 



CD 

o 



> g Q 
X S 



O CD 

■§ 2 



^ < o 



<0 Q> 

o 



m 



J) 

CO 



CM 



CO 



oi 

CO 
CM 



lO 
CO 
CO 
CM 



'(0 o 
^ — o 



oh 



CI 
Cl> 



CM 

CO 
CM 

in 



CO 



CO 

o 



CO 

o 



CO 

o 

lO 
CO 



CD 
C3> 



CO 

o 



CO 



CO 
CO 



CD 

CD 
CM 



CM 

in 

CO 



00 

CO 



co 

CO 



CI 



CO 

o 



CO 
o 
o 



o 

CO 

o 



CO 



CJ3 



CM 
CO 



oo 

CO 

o 



O) 

o 

CD 



CO 
CD 



o 

-I m 
O %1 



o 
co 

CD 



O 

CO 
CD 



CO 



CO 
CO 



in 



m 

CD 



in 



0) 

^3 

CO 



o 



o 

CD 



O 



o 
o 



CO 
OO 
CD 

o 



in 
oi 
in 



OO 
C3> 



in 

OO 



CM 



o 

CO 



OO 
CO 



o 
m 



CM 



o 
xr 
in 



CO 
CO 
CO 



S E 



O O 



o 

CD 



o 

Oi 
CD 



O 
CD 



o 
in 

o 



o 
in 



o 

CO 



o 
o 



o 

CD 



to 

CQ <^ 



CO 



in 



CO 
CO 

in 



CO 

CD 

in 



CO 
CD 

in 



m 



CM 
CO 

CO 

tn 



OO 
CM 



o 

CO 

in 



m 



CO 



CM 

in 



OO 
CO 



in 



CO 



CM 

CO 
CO 



OO 
OO 

CO 



oo 

LU Z 
CO 



ro 

CO 
CD 



00 



OO 



cn 

OO 
CD 



CM 
CD 
CD 



CD 
CD 



O 
O 

o 



CM 
O 
O 



O 
O 



o 
o 



0) 

o 
c 

3 
XT 
0> 
CO 
o 

o 

< 



< 

U 
O 

H 

U 



O 

Eh 

< 



O 
H 



1 

Eh 

U 
H 



0.2 
hi «f 

Q. CO 
« I? 



co 

OO 



CO 
CO 



OO 



OO 



CM 
OO 



XT 
CO 



CO 
CO 



o 
in 

CO 



CM 

in 

CO 



in 
in 

00 



m 

OO 



CM 
CO 
OO 



CO 
CO 
OO 



0) 

o 
o 
Q 
>* 

<D 
C 



4 



CD 

05 
O 



s 

I- 

? •= JS 
> g Q 

O 



o c 
5^ 



o ^ 



CM 



CO 



CO 



in 



CD 
CO 



CD 
CO 
CM 



CM 

00 



CD 



CO 

o 
CO 
m 



CO 

CM 



in 
ir> 



o 
od 

lO 



o 

CD 

o 
m 



in 

CO 

o 

CO 



o 



CO 
CM 



o 

CO 

CO 



CO 



CO 
CD 



CD 



CO 



XI- 
CO 
O 



CD 
CO 

in 

CD 



m 



CO 
CM 
CN 



o 
in 
o 

CD 



o 

< 



o 
m 



CN 



oo 
c> 



CO 
CO 
CM 

CD 



CM 



s o 
- o 

CO 



CO 
CD 
CD 



CO 
OO 
OO 



CD 
CO 



CD 
CO 

in 

CD 



in 

CO 



o 

CO 



CM 



o 
o 
o 



0) 



3 o O 
P P 

n CO 



o 



o 
00 



o 

CD 



o 

CD 



o 

O 
CD 



CM 



O 
O 

CD 



in 



m 

CN 

o 



in 



CD 
CD 



CD 



o 
in 



Oi 

o 



in 

Oi 
OO 



o 

CD 



CO 
CN 



CN 
CD 
Oi 



o 
CO 



Q 

< 



O 

o 



OO 
OO 



CO 



o 

CD 



CD 
CM 



CD 
CO 

oi 

CD 



CM 
O 



CD 



O 
CO 

CN 
CD 



UJ z 
CO 



o 
o 



rH 

O 



O 



t-A 

O 



CO 
CM 

1^ 
CO 



O 



CM 
O) 
CD 



CO 
CD 



CM 

o 



CO 



O 
CN 
CD 
CD CO 



CO 
CO 

in 

CO 



<J1 

CnJ 

o 



CO 

ro 
o 



co 



o 
c 

0> 
3 
O" 
4> 
CO 

o 

CI. 

< 



Eh 

H 

O 
U 
U 
O 

a 

H 



u 
o 
u 
o 
o 



H 
O 

Eh 



E-t 

8 

Eh 
Eh 

O 

u 
u 
u 
o 

Eh 



O 



6 

U 

Eh 

i 

Eh 
Eh 

U 
U 



o 



Q. CO 



CO 
CO 



CO 
CO 
00 



CD 
CD 
OO 



CO 
CO 



00 
OO 



o 

OO 
OO 



CM 
OO 
OO 



in 

CO 
CO 



o 
o 

H 
Eh 
U 
H 



U 

Eh 
Eh 

8 

Eh 

Eh 
H 
CO 
OO 
OO 



U 

o 
o 
e-« 
o 
o 

Eh 

u 

g 

o 

H 
Eh 
H 
O 
O 
H 
Eh 

O 
CD 
OO 



o 
o 
Q 

(D 
C 

o 



4 



o 



CO 



HIV PRT 
GeneChip™ 
Data 


o 


o 


o 


o 


o 


o 


o 


CO 


CM 




o 


o 


O 


CD 




CO 


CO 


CO 


CO 


^D 


CO 


to 


CM 


ay 


oo 


o 




16304, 


14885, 


11910, 


11929 


11517, 


11822 


11710, 


7635 


8378 


6321 


7659 


11621 


3389, 


3870 


1992 


CO 

ay 

CD 


00 


CM 

CO 


CD 

oo 


CD 
CO 


O 
Oi 


1663, 


2694. 


3222. 


3142. 


5867. 


6641. 


Window- 
Averaged 
Composite 


Score 


1 


1 


i 


i 


i 


1 


i 




1 


1 


1 


1 






1 






i 

^ 












3 


3 


4 


3 


Composite 
Score 


CM 


in 
o 

CD 


LO 

CO 


CO 

o 


o 




CM 


lO 


t — 
o 
to 


o 
o 

lO 


CO 


TT 
CO 
TT 


CO 
TT 


CO 


CO 
CO 
CM 


O 
O 


o 
CO 


CO 
lO 
CO 


CO 

ai 


oo 

CD 
CO 


CO 
CO 

to 


CD 
CD 
CO 


oo 

CO 


co 


to 

CD 
TT 


CO 
CO 


ai 




o 


o 


o 


CD 


o 


CD 


CD 


Ci 


CD 


C5 


CD 


CD 


CD 


CD 


CD 


ci 
1 


CD 
1 


CD 

1 


CD 


CD 
1 


CD 
1 


ci 
1 


CD 


CD 
1 


CD 
1 


CD 
1 


FOLD 

)re 




CO 

c\i 


CO 
CM 


CO 


CO 
CM 


CO 
CM 


CO 
TT 

CM 


lO 
O) 
00 


O 
CD 


N- 
lO 


O 
CO 


CD 

ay 


CD 

ay 


CO 

a> 


CD 
CD 


CD 
CD 


CD 

ay 


CO 

ay 


CD 

ay 


o 


O 


o 


o 


o 


o 


O 


CO 

oo 
to 


CO 

ai 


< 






1 


1 


1 


1 




CD 
1 


CD 


CD 


ci 


CD 


CD 


CD 


ci 


CD 


CD 


CD 


CD 




CD 


ci 


CD 


CD 


CD 


CD 


CD 


CD 






1^ 
o 


CO 


CM 

p 


CO 
lO 
CO 


CO 


CM 
<=> 


o> 
o> 


CO 
CO 


CO 
CO 


to 
ay 

CO 


to 

C7> 
CO 


to 

CI> 
CO 


to 

CJ> 
CO 


to 
ay 

CO 


CM 


ay 

CO 
CM 


to 


CM 
OO 
00 


o 

CO 

to 


lO 

<q 


T— 


CM 

00 




O) 

to 

CO 


Oi 
p 


S 
P 


ay 
to 
oo 


Sc 




cvi 






o 


o 


r— 


CD 


CD 


CD 


d 


C> 


<=> 


ci 


CD 


CD 


CD 

1 


ci 
1 


CD 
1 




1 


t 


CD 
1 


CD 
1 


CD 
1 


1 


t 


CD 

1 


o ^ 




o 

CO 


o 

CO 


o 
CO 


o 

CO 


o 
CO 


O 
CO 


O 
Oi 


o 
cn 


O 
P 


o 

Oi 


O 
h- 


o 


O 


o 


O 


o 


O 


o 


o 

CD 


o 

CD 


o 

CD 


o 

CD 


o 

CO 


o 

CD 


o 

CO 


o 

00 


o 

CM 


3 o o 

o c « 
1:1 lo 
ry m ^ 






1 




1 


1 


1 


CD 
1 


CD 




CD 


Ci 


c> 


CD 


CD 


c> 


CD 


ci 


CD 


CD 


ci 


CD 


CD 


CD 


CD 


CD 


o 




























































< _ 

P 


oo 


CM 


CO 


CM 
00 


CO 




CM 


lO 

to 




CD 

o 


CO 
O 


CO 

o 


CD 
O 


CD 

o 


CM 




CD 

oo 


CD 
CO 


CO 

ay 


T— 


CJ> 


CO 
CD 




lO 


CM 


CO 


to 




CM 


CO 


-sr 

CO 


CD 
CO 


CD 


CD 


CO 
CO 


CM 
CD 


CO 
CO 


CO 
CD 


CO 
CO 


CO 
CD 


CO 
CO 




od 

lO 


CO 

to 


to 


ai 


cji 


CM 

to 


to 


to 


lO 


CO 

to 


CO 

to 


to 
























































SEQ ID 
NO: 


1035 


ro 
o 


r- 
ro 
o 


oo 
ro 
o 


cn 
ro 
o 


o 
o 


t— t 
o 


eg 
o 


ro 
o 


o 


LO 

o 


O 


o 


00 

o 


CD 

o 


o 

LO 

o 


1-1 

LO 

o 


eg 

LO 

o 


ro 

lO 

o 


in 
o 


LO 

in 
o 


LO 

o 


LO 

o 


oo 

LO 

o 


LO 

o 


o 
o 


.-1 
o 


DNA Probe Sequence 


TGGTTTCCATCTTCCTGGCA 


GGTTTCCATCTTCCTGGCAA 


GTTTCCATCTTCCTGGCAAA 


TTTCCATCTTCCTGGCAAAC 


TTCCATCTTCCTGGCAAACT 


TCCATCTTCCTGGCAAACTC 


CCATCTTCCTGGCAAACTCA 


CATCTTCCTGGCAAACTCAT 


ATCTTCCTGGCAAACTCATT 


TCTTCCTGGCAAACTCATTT 


CTTCCTGGCAAACTCATTTC 


TTCCTGGCAAACTCATTTCT 


TCCTGGCAAACTCATTTCTT 


CCTGGCAAACTCATTTCTTC 


CTGGCAAACTCATTTCTTCT 


TGGCAAACTCATTTCTTCTA 


GGCAAACTCATTTCTTCTAA 


GCAAACTCATTTCTTCTAAT 


CAAACTCATTTCTTCTAATA 


AAACTCATTTCTTCTAATAC 


AACTCATTTCTTCTAATACT 


ACTCATTTCTTCTAATACTG 


CTCATTTCTTCTAATACTGT 


TCATTTCTTCTAATACTGTA 


CATTTCTTCTAATACTGTAT 


ATTTCTTCTAATACTGTATC 


TTTCTTCTAATACTGTATCA 


p5 Probe 
Position 


O) 

oo 


CM 

ay 
CO 


CO 

cy> 
oo 


oo 


»o 
oo 


CO 
O) 
CO 


CD 

oo 


oo 
Oi 
oo 


a> 

00 


o 
o 
a> 


o 
o> 


CM 
O 
CD 


CO 

o 

CD 


o 
a> 


to 
o 
a> 


CO 

o 
ay 


o 
ay 


oo 
o 
ay 


ai 
o 
ai 


O 
ai 


ay 


CM 


CO 

a> 


ai 


to 

5> 


CD 

ai 


T— 

























































o 
o 
o 

a> 

E 
o 




HIV PRT 
GeneChip™ 
Data 


7151.9 


8134.9 


8551.4 


5741.7 


8575.9 


8980.3 


10762.0 


17037.0 


20970.0 


23084.0 


24474.0 


22217.0 


19829.0 


23548.0 


21759.0 


22711.0 


18134.0 


17772.0 


17134.0 


10969.0 


9556.3 


3739.9 


4088.3 


2263.0 


1018.0 


1319.1 


2347.8 


Window- 
Averaged 
Composite 
Score 
















9950 1 






0.875 


0.910 


0.890 


0.842 




























Composite 
Score 


-0.176 


-0.086 


-0.023 


-0.211 


0.140 


0.197 


0.140 


0.553 


0.938 


0.929 


1.065 


0.953 


0.763 


0.927 


0.799 


0.799 


0.589 


0.582 


0.614 


0.635 


0.262 


0.034 


0.003 


0.224 


0.074 


0.074 


0.074 


o 

-1 <1> 

o £: 
s O 

< 


GO 
O 


oo 
o 


oo 

T — 

CD 


oo 
1^ 
o 


GO 

o 


GO 

o 


GO 

o 


OO 

CD 


GO 
O 


GO 

T— 


0.410 


0.410 


0.410 


0.410 


0.410 


0.410 


0.410 


0.757 


1.279 


1,279 


0.323 


0,149 


-0.982 


0.149 


0.149 


0.149 


0.149 


Score 


-0.908 


-0762 


-0.661 


-0.964 


-0.398 


-0.307 


-0.398 


0.268 


0.889 


0.874 


1.466 


1.286 


0.979 


1.244 


1.037 


1.037 


0.699 


0,475 


0,207 


0.241 


0.225 


-0.036 


1 0.607 


0.271 


0.028 


0.028 


0.028 


So a 

O C o 
sl «> 
CD « *^ 

<|^@ 


o 

CO 


o 

CO 


o 

CO 


o 

CO 


o 

CO 


o 

CO 


o 

CO 


o 

CO 


o 

CO 


o 

CO 


0.60 


0.60 


0.60 


0.60 


0.60 


0.60 


0.60 


1.00 


1.60 


1.60 


0.50 


0.30 


-1.00 


0.30 


0.30 


0.30 


0.30 


< 


54.17 


55.17 


55.86 


53.80 


57.65 


58.28 


57.65 


62.19 

i 


66.43 


66.32 


70.36 


69.13 


67.04 


68.85 


67.44 


67.44 


65.13 


63.60 


61.77 


62.01 


61.90 


60.12 


1 64.50 


62.21 


60.56 


60.56 


60,56 


SEQ ID 

NO: 


CNJ 

o 


cn 
o 


O 


in 
o 


o 


P- 

KO 
O 


CO 

o 


cn 
o 


o 
r- 
o 


o 


fSl 

r- 
O 


cn 
r~ 
o 


o 


in 
r- 
o 


r- 
o 


o 


CD 

r- 
o 


a\ 
r- 
o 


o 

00 

o 


r-i 

OO 
O 


CM 
OO 
O 


ro 
oo 

O 


OO 

o 


in 

oo 
O 


VD 
OO 
O 


OO 

o 


OO 

oo 
o 


DNA Probe Sequence 


TTCTTCTAATACTGTATCAT 


TCTTCTAATACTGTATCATC 


CTTCTAATACTGTATCATCT 


TTCTAATACTGTATCATCTG 


TCTAATACTGTATCATCTGC 


CTAATACTGTATCATCTGCT 


TAATACTGTATCATCTGCTC 


AATACTGTATCATCTGCTCC 


ATACTGTATCATCTGCTCCT 


TACTGTATCATCTGCTCCTG 


ACTGTATCATCTGCTCCTGT 


CTGTATCATCTGCTCCTGTA 


TGTATCATCTGCTCCTGTAT 


GTATCATCTGCTCCTGTATC 


TATCATCTGCTCCTGTATCT 


ATCATCTGCTCCTGTATCTA 


TCATCTGCTCCTGTATCTAA 


CATCTGCTCCTGTATCTAAT 


ATCTGCTCCTGTATCTAATA 


TCTGCTCCTGTATCTAATAG 


CTGCTCCTGTATCTAATAGA 


TGCTCCTGTATCTAATAGAG 


GCTCCTGTATCTAATAGAGC 


CTCCTGTATCTAATAGAGCT 


TCCTGTATCTAATAGAGCTT 


CCTGTATCTAATAGAGCTTC 


CTGTATCTAATAGAGCTTCC 


p5 Probe 
Position 


oo 
o> 


Oi 
CD 


o 

CNI 

o> 


CM 
C7> 


CM 
CM 
CD 


CO 
CM 

o> 


CM 


in 

CM 

Oi 


CD 
CM 
Oi 


1^ 

CM 
Oi 


oo 

CNI 

o> 


OJ 
CM 

o> 


o 

CO 

at 


cri 
cn 


CM 
CO 

Oi 


CO 

CO 

Oi 


CO 

o> 


in 

CO 

Oi 


CO 
CO 
CJ> 


CO 

Oi 


oo 

CO 
Oi 


Oi 
CO 

Oi 


o 

CJ> 




CM 


CO 
Oi 


Oi 



4 



O 



> g Q 
O 



5 Si. " 

2 m O 

= S c: 

^ < o 



CO 0) 

o ^ 



CO 



oo 



CD 



CO 
CO 
CM 
CO 



CM 

CNJ 



CD 
CM 



CNJ 
CO 



o 
iri 
o 

CNI 

in 



CD 

CD 
CO 



o 

CD 



(0 o 

o 
o 



1 

CI 



to 



to 

CD 



CO 

CM 

oo 



to 

CO 



CO 
lO 



oo 



CM 

to 



CO 



CM 
CO 



o 



CO 

CM 



in 

CD 

o 
czi 



o 
o 



lO 

o 



cn 
oo 



CO 

C7> 
CD 



CD 



CD 



o 

CO 



m 
o 

CD 



Q 

O il 



C3> 



CD 



CO 
CO 

in 



in 
o 



in 
o 



E o 
- O 
CO 



GO 
CM 
O 



00 
CO 

c> 



o 

CM 



OO 
00 



CM 



CO 
CO 
O 

CM 



O 
O 

in 



3 o o 

o c » 

= 1 "« 



o 

CO 



o 

CO 
CD 



O 
CD 



O 
OO 



o 

CD 
CD 



O 
CD 

c5 



O 



O 



O 



< _ 

< E 



CD 

in 

CD 
CD 



CD 



O 
CM 
00 

in 



o 

CM 

od 
in 



CM 



o 

CD 



OO 
CO 

CT> 



CD 

o 

CD 
OO 



CM 

1^ 



CO 
CM 



CO 
CM 



co 

OO 
CO 



CO 

CO 



ay 

CO 

CO 



oo 
tu z 

CO 



00 

o 



CM 
O 



o 



o 



CM 
O 



LD 
O 



O 



0) 

u 
c 
o 

3 
O" 

0) 
CO 
0) 

o 

w 

CL 

< 



Eh 



H 
U 

Eh 



H 
U 
U 
H 
Eh 
O 



u 



u 
o 
a 
u 

Eh 

u 

O 
Eh 
Eh 

U 



u 
a 
a 
u 
o 
o 
o 

H 
Eh 

O 
U 
Eh 
H 



U 

a 
u 
o 
o 

u 



Eh 



Eh 



II 

Q. CO 

Q. Cl- 



in 



CM 

in 



in 
cn 



CM 
CD 
CJ> 



in 

CD 



oo 

CO 
05 



o 
o 
Q 

>^ 

0) 

c 



4 



CO 

o 



CD 



> 2 Q 
O 



" e 

^ o 



CO d> 
O 



< 



CO 



o 

I- o 
CO 



O C o 

si w 



O 

g 

E 



9 

UJ z 
CO 



o 
o 
c 
a> 

3 
O" 
0> 
CO 
0) 

o 

w 

GL 



CL (0 



CD 



CO 



CO 



OO 



CD 



CO CO 

lO to 
CO CD 



CO 



oo 
lO 
CO 



CVi 

CO 
iO 



00 
CO 



oo 



a> 

o 
o 
Q 



o 
o 

o 



CO 

o 



oo 



CM 

oo 



CM 
C3 



lO 

cvi 



CO 
CO 

cm' 



CO 
CM 
CO 



o 
o 
oo 

o 



CO 
00 



lO 
CD 

o 

CM 



ID 
O 



CO 

a> 
o 



o 

CM 
CO 

o 



CO 



o 
oo 

CM 



o 

CO 

cvi 



o 

CM 



o 
o 



CM 

oo 

lO 
CD 



o 

00 
CO 



CM 

CO 

o6 

CO 



CO 



CO 
CO 



o 
ro 



CM 



n 

CO 



in 
ro 



CO 



O 
O 



8 
g 

8 

H 
O 



a 
a 
o 

H 
H 
O 
O 
Eh 
O 



a 

CD 
H 

8 

O 
C5 
O 

8 

Eh 

O 



u 
o 

iH 

o 
u 

g 

i 
8 



eg 



u 

H 

i3 

H 
O 
O 

i 



Eh 

C5 



o 
o 

H 
H 

a 

Eh 

o 
o 



3 

CD 
CD 
O 



CM 
Oi 
Oi 



in 



CO 
C3> 



oo 

Oi 
CD 



4 



CO 

o 



s 

I- 

? -C 2 

> ® Q 
X S 
O 



0> 

o 



O 



CO 



c S c 

^ < o 



o 

-J <u 
O 



^3 

CO 



3 o O 

o c « 
(I) m <^ 



< _ 



ao 

UJ z 
CO 



o 
c 

0> 
3 

o- 

o 

CO 

a> 

s 

GL 

< 

Q 



"2 o 

Q. tf> 

« *? 



CO 



CO 



CO 



to 



CD 




CO 



o 

CO 



5 



o 
cn 



00 

to 



CM 
O 



o 
o 
Q 

0) 




Example 3 



Synopsis : The method of the present invention is particularly useful as a guide to 
the iterative refinement of probes. One of the specific predictions made for rabbit 
p-globin in Example 1 is used to provide an example of such a refinement. 

Materials and Methods : The contig spanning positions 5-11 of a portion of the 
rabbit (3-globin gene (Example 1, Table 3) was analyzed, using the experimentally 
measured data to simulate the results of successive experimental measurements. 
The iterative refinement was performed using a rule-based algorithm, outlined 
below. This algorithm is used by way of example only; other algorithms for 
efficiently finding local maxima are well known to the art and could be employed to 
perform this task. 

Given experimental data for probes from the 1®^ quartile, median and 3"^ 
quartile of a contig, as well as a user-set signal threshold for further consideration 
of a probe, 

1) If all 3 measurements are below the user-specified signal threshold, discard 
the prediction. 

2) If at least one of the measurements is above the user-specified threshold, 
determine which point yields the maximum signal. 

a) If the maximum point is the 1®* quartile probe, then make three new 
measurements for probes with the same spacing as that used in the 
preceding iteration, but displaced so that the third probe is identical to the 
original 1^^ quartile probe. In other words, repeat the search with the same 
pattern and spacing, but displace the pattern in the direction of increasing 
signal found in the first experiment. 

b) If the maximum point is the 3'^ quartile probe, then make three new 
measurements for probes with the same spacing as that used in the 
preceding iteration, but displaced so that the first probe is identical to the 
original 3*^^ quartile probe. In other words, repeat the search with the same 



Attorney Docket No. 10971464-1 




pattern and spacing, but displace the pattern in the direction of increasing 

signal found in the first experiment, 
c) If the maximum point is the median probe, then repeat the experiment, 

keeping the median point the same, but shrinking the spacing between 

probes by a factor of 2. 
3) Continue iteration until a maximum is found, or the user judges the signal level 
observed to be acceptable. Use the experimental value measured for the 
probe duplicated in successive iterations to tie together the successive data 
sets, via a simple normalization procedure, described below. Where 
appropriate, consider all of the data (i.e. all of the iterations) when deciding 
how to proceed, or whether the peak hybridization intensity has been found. 

Results : Iterative refinement of the contig spanning positions 5-1 1 in Table 3 
proceeds as follows: 

Iteration 1: Probes are synthesized at positions 6, 8 and 10, yielding the 
experimental hybridization intensities 180, 220 and 310, respectively. 

Iteration 2: Following rule 2b), probes are synthesized at positions 10, 12 and 14. 
Note that the redundant measurement at position 10 serves as a bridge between 
experiments, and allows comparison of the two sets by normalizing the intensities 
by multiplying the second iteration measurements by the ratio of the intensity 
observed for the probe at position 10 in the first iteration to the value observed in 
the second iteration. In the simplest case, the ratio is 1 ; in any case, the second 
iteration yields the normalized values 310, 390, 240 for probe positions 10, 12 and 
14, respectively. 

Iteration 3: By rule 2c), measurements are performed for probes at positions 1 1 , 
12 and 13; after normalization, these yield the normalized hybridization intensities 
320, 390 and 410, respectively. Combination of these results with the results from 
iteration 2, probe position 14, yields the conclusion that the best probe for this 
intensity peak is the probe that starts at sequence position 13. 
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The overall result is that iterative improvement converges in three iterations, and 
requires the synthesis of seven test probes, one of which is the local optimal 
probe. In addition, the first and second iterations yield probes that exhibit 75% 
and 95% of the local maximum hybridization intensities, respectively. In many 
applications, either of these probes would be considered acceptable. 

The above examples 1 and 2 demonstrate that two different 
implementations of the method of the present invention are capable of efficiently 
predicting regions of high hybridization efficiency in a variety of polynucleotide 
targets. Many of the predictions yield acceptable probe sequences on the first 
design iteration, and all would yield optimized probe sets after 2-4 rounds of 
iterative refinement, as demonstrated in Example 3. The performance 
demonstrated in these examples greatly exceeds the performance of current 
methods. Finally, the examples demonstrate that the predictions can be 
performed by a software application that has been implemented and installed on a 
Pentium®-based computer workstation. 

All publications and patent applications cited in this specification are 
herein incorporated by reference as if each individual publication or patent 
application were specifically and individually indicated to be incorporated by 
reference. 

Although the foregoing invention has been described in some detail by 
way of illustration and example for purposes of clarity of understanding, it will 
be readily apparent to those of ordinary skill in the art in light of the teachings of 
this invention that certain changes and modifications may be made thereto 
without departing from the spirit or scope of the appended claims. 
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